gautamcr

Thinking differently about logarithms

It is always interesting to understand mathematical (or any) concepts when they are explained in simple intuitive ways.

– How long will it take to double your investment at a fixed compounding rate of 8%?
– What does the natural base e really mean (also known as Euler’s number)?
– What does it mean when they say slope of y = e^x at any x is e^x? (ok, this one in a separate future post but it highlights a fantastic fact about e!)
– what is exponential growth versus linear growth? (hopefully that is self evident at the end of this post)

Logarithms are an aid to understand and answer these types of questions!

Let’s start of with a simple equation: 2⁴ = ?

The answer is of course, 16.
2⁴ = 16

What does this actually mean in the context of “growing” ? It means if you start with 1 of something, and grow at the rate of 2 every iteration, then at the end of the 4th iteration you end up with 16 of that thing:
1 x 2 = 2
2 x 2 = 4
4 x 2 = 8
8 x 2 = 16

That’s four 2s above. I end up with 16 times what I started with. What if you tripled every iteration for 4 iterations? You end up with 81 times of that thing you started with:
3⁴ = 3 x 3 x 3 x 3 = 81

In the above cases, the “growth rate” is 2 (double) or 3 (triple). If I wrote the above in logarithmic form, I end up with this:
log ₂16 = 4 (read as: “log 16 to the base 2 = 4”)
OR
log ₃ 81 = 4

Or in other words, if my growth rate is double each iteration, it takes me 4 iterations to reach 16 times what I started with. We are interested in the iterations in this example.

If I had $1 and I doubled it every iteration, I’d have $16 at the end of 4 iterations or 16 times what I started with (1 to 2 to 4 to 8 to 16 = 4 iterations)

Doubling it implies I grow at the rate of 100% every iteration. This may be true for bacteria or cells – 1 cell becomes 2, 2 becomes 4, 4 becomes 8 and so on. In the real world I wish my money doubled every iteration, but that would have to be some magical investment. Let’s say my money grows at a more realistic rate of 8% (instead of 100%) every year, or I earn 0.08 times the starting amount, every iteration. So then my growth rate is 1.08 every iteration. I had $1 initially, I have $1.08 at the end of an iteration.

At the end of the second iteration, I have $1.08 + (0.08×1.08) and so on. So plugging this growth rate into the logarithm form:

log_1.08 X = y iterations?

What is X here though? Recall this below meant if I doubled every iteration and wanted to reach 16 times my starting amount I had to go through 4 iterations:

log₂ 16 = 4

In this example we want to double our investment so X is the multiple by which I want to grow which would be 2. So if I grow at 1.08 every iteration, how many iterations would it take for me to reach 2 times what I started with?

log _1.08 2 = y iterations

My logarithmic tables tell me the answer is: 9.006468342

I’ll approximate that number since that is good enough for this purpose: it would take me 9 iterations to get to a multiple of 2 if I grew by 8% every iteration. In the real world, that would normally mean 9 years (since the 8% is an annual compounded growth rate) to double my investment! (This is also called the rule of 72 which is if you divide 72 by the rate of return – 8% – you end up with the number of iterations to double your initial investment!)

No Growth

Continuing on with some more fun. What if you did not want to grow at all i.e. I want my growth multiple to be just 1? How many iterations would it take to not grow at all?

log_1.08 1 = ?

The answer is 0. Of course! zero iterations! If you waited 0 years you would not have grown your investment at all. Which also brings up the interesting side story that anything raised to the zero index is 1:

1.08 ⁰ = 1

In fact any growth rate raised to the 0 index would end up leaving you with exactly 1 unit of what you started with!

Fractional Growth

What would a fractional growth rate give you?

log _0.5 2 = ?

What does this even mean? It means I’m shrinking by 50% every iteration! When would I have double my starting number if I shrank by 50%? It does seem like an absurd question but the answer tells you something interesting:

log _0.5 2 = -1

A negative 1? Yes! What it is essentially telling you is you’d have to go an iteration in the past if you wanted to see your investment double! Fair enough – we are shrinking by half every iteration as we move into the future. Or in other words:

0.5^-1 = 2

Your traditional definition of “x raised to the power of y means repeated multiplication of x, y times” may not seem intuitive enough to solve the above problem! It is tricky to explain what raising to a negative power means unless there is a context of growth and time involved in the explanation. Also where can negative growth make sense? If you’ve heard of “radio-carbon dating” to estimate ages of prehistoric organic material it may start to make sense!

EULER’S NUMBER

So what is Euler’s number or e? This famous number is something found in nature wherever compounding growth is involved: money, rabbits, cells, population etc. In the compound interest solution that we saw earlier where the growth rate was 1.08 we saw that things doubled after about 9 intervals or iterations. What if we increased the iterations and divided the growth rate evenly by those same number of iterations? In other words given this:

(1 + .08/1)¹ = the growth after one iteration

(Note that the .08 growth is over 1 time period so we have the “.08/1” in the equation and since we are looking at what we have at the end of 1 interval it is raised to the power of 1). What if the bank compounded the rate twice in a year? Or:

(1 + .08/2)²

For simplicity lets assume the growth rate is 100% i.e. 1 not 0.08. So:

(1 + 1/2)²

frequency of compounding	growth multiple
(1 + 1/1)¹	2	compounded 1 times
(1 + 1/2)²	2.25	compounded 2 times
(1 + 1/12)¹²	2.61303529022	compounded monthly
(1 + 1/365)³⁶⁵	2.71456748202	compounded daily
(1 + 1/1000)¹⁰⁰⁰	2.71692393224	compounded a 1000 times
(1 + 1/10000)¹⁰⁰⁰⁰	2.71814592682	compounded 10,000 times

Well, you can see a pattern above – the growth multiple jumps up by large increments (2, 2.25, 2.61..) but as we tend to increase the number of compounding intervals the final value approaches a magical number that begins with 2.7. No matter how many times you compound (“..to infinity and beyond!”) you will only make very minor increments but never go past the 2.7xxxx boundary. This is Euler’s number or e. No matter how many times you compound your starting amount you cannot end up with more than e‘s multiple of what you started with. Also, e is an irrational number who’s end is not known just like PI.

AWS Load Balancer with R53

This is a 50,000 feet view of how an AWS Application Load Balancer (ALB), an Auto-scaling group (ASG), a Target Group (TGP) and Route53 can work together.

ALBs can load balance your HTTP and HTTPS traffic forwarding them to registered targets in the target group evenly based on health check configuration specified in the target group.

The ASG provisions the desired number of instances into potentially multiple AZs (you control that when you create the ASG) to keep an even balance for HA and registers them with the TGP. Once the health checks pass for a target (instance) the ALB is ready to route requests to it.

The ALB itself can be provisioned in multiple AZs for HA (you should provision them in more than one AZ for HA). AWS creates a DNS name for it and creates its own HOSTED DOMAIN to store the related IPs. You can then create an Alias record in Route 53 pointing to this ALB name for DNS resolution for your users. Assuming you registered a domain called example.com and want to create a subdomain called acme.example.com you create an Alias record pointing to the ALB name in Route 53 for this subdomain name and that completes the loop.

Note that the public IP of the ALBs can change but Route 53 will keep it up to date in your example.com domain and return the latest IPs back to the users.

Preflighted Requests

When accessing a Twitter feed, say to fetch a tweet, via its REST endpoint using Javascript XMLHttpRequest (an axios.get() for example) you would run into this CORS error on the browser:

This is of course because your script running on domain a (the desktop) is trying to fetch a resource that resides on domain b (twitter). Unless the responding server explicitly allows this the browser will fail to fulfill the request and return the error above. This is an implicit security setting implemented in the browser to allow only resources from the same-origin to be accessed.

The browser is actually performing 2 requests under the hood. When it detects that certain headers related to Authorization (like Bearer Token) have been set in the request it does a Preflight request querying the remote server to see if the real request would be allowed. It does this by performing an OPTIONS request and checking if the Access-Control-Allow-Origin header is received in the response. If this is absent you end up with the error above.

How do you solve this? One method is to proxy the request through a CORS proxy server that will relay the request through to domain b, collect the response and add the Access-Control-Allow-Origin header to it before passing it back to you, the original caller. There is an example proxy server available that would do exactly this. When you route your query through this proxy the CORS problem goes away. Note the URL being proxied through the cors-anywhere instance:

Behavior Driven Development

A few hundred sols ago we explored using some type of automated testing to speed up our testing efforts which were largely manual and semi-automated. Turn around times for complete regression testing ranged from days to weeks which was a major impediment to quick release cycles. It was also an exhausting task for our QAs while leaving the DEVs guessing the impact of all those code changes that went into the release. Yes, the unit tests worked but that was just a small little warm-fuzzy and did not prove nothing bigger was broken.

Usually when we talk about testing the following stages come to mind:

Unit tests: The scope of these tests is very localized to the feature or fix that is coded. You may run a larger set of unit tests at the product level to ensure nothing within the product broke. It provides relatively quick and immediate feed back so you instantly, or within a few minutes, know if something is broken and can fix it. External services that your product interacts with are typically mocked in these tests.

Regression tests: Typically involves testing a larger set of scenarios this time with real instances of external services where applicable and others mocked. This may happen after a code build via a CI/CD tool and deployed to an automated test environment

User Acceptance Testing: This involves the customer or an external user of the product running some tests to ensure their use-cases work as expected. Maybe some performance testing is involved in this phase as well but rarely?

The key here is that as testing moves farther and farther away from the developer’s domain it becomes increasingly expensive to fix it in terms of time taken. When failures happen close to the development stage it is easy to notice and fix them. Once it is “delivered over the fence” to professional testers the time factor to detect and fix errors increases. This could substantially affect the quality of the product because the developers have “moved on” to the next set of features/releases and lose the context of the previous release which may no longer be fresh in their mind so it can be effectively fixed.

The important thing here is the speed of feedback. It goes without saying that first of all, automated testing is the only way to accomplish this and second of all it needs to happen close to the development cycle, preferably in the “gap” between when it is trying to move from the developer’s domain to the QAs. That way the developers do not need to wait for an inordinate amount of time for full feedback on their code.

This automated testing could also be a common reference point between the developers and the QA teams so that its outcome would trigger the same confidence levels on the build for both parties. This was important because otherwise the developers would call it a day after all their unit-tests passed and now the “ball was in the QA’s courts” so to speak.

The implication then, when a shared test suite becomes important to both parties, is that they share a common language in which to express the tests. It not only makes it clear what is being tested but also facilitates the re-creation of a test in any environment without ambiguity. This is where we stumbled upon the world of BDD and test expression languages like Gherkin and its implementation in various forms (Java, Python). This was a game changer in terms of how we approached the whole testing process.

For us testing both at the lowest level (unit tests, “are we building the system right?”) vs business level testing (QA testing, “are we building the right system?”) was important. We already had automation at the unit test level but not at the QA level who typically worked more on the business level. Moving to BDD facilitated automating business level testing as well which saved a huge amount of time at what was typically a large time-consuming activity. Testing times came down from several days to weeks to under 24 hours.

BDD can span multiple domains: developers, testers, business stake holders. It does take organizational buy-in for it to permeate into the various layers and to result in a “group think”mindset where everyone is thinking “testing first”. This may not be an easy sell in an organization due to the heterogenity of processes and nature of product or service provided. In most cases it may be prudent to introduce it in little steps affecting the smallest number of persons/teams to see how effective it can or cannot be!

The scope and techniques of BDD is too big to be discussed in one blog post. Here is a place to get started for more insight! Hopefully I can return for a second part on this topic and discuss some of the best practices that I think would be helpful for someone interested in exploring this in their own workspace!

Authentication using Oauth 2.0

OAuth2 is an authentication framework used widely today for clients to access API endpoints. There are various flows available for authentication depending on the use case. Two popular flows commonly used are code-grant and client-credentials.

Code Grant is typically used when an application is requesting access to a resource on behalf of a user. In this instance the application redirects the user to an OAuth2 provider (such as Google or Facebook) and lets the user login with their OAuth login and password at the providers web site. The provider validates the login and redirects the user back to the application with an authorization code. The application can then exchange this code for an access token (along with its own client-id and secret that it obtained when it first registered with the OAuth2 provider) that it can then pass to the resource server to interact on behalf of the user.

The resource server verifies that the access token is valid by checking its authenticity in possibly different ways. One of the popular ways is when the access token is a JWT (Json Web Token). In this case the application has to verify that the signature in the JWT is valid so no one tampered with it and that it has not expired. Verification of the signature could happen in multiple ways – the OAuth provider could produce a hash using the secret of the application or it could sign it using a private key who’s public keys are advertised in a well known location. In both cases the application regenerates the signature using either method and checks to make sure it matches the signature in the JWT. The application then extracts the principal (the user id if you like to think of it that way also called a “sub”) to identify who the user is and what their “scope” is (what they are allowed to do on the resource server).

Client Credentials is typically used when the application wants to access a resource not on behalf of a user but itself. A use case may be when an automated batch processes for example want to access data on the resource server. In this scenario the application sends its client-id and secret (that it obtained when it first registered with the OAuth2 provider) to obtain an access token. The provider validates this information and returns an access token back to the application which it then passes to the resource server with all its API calls. The resource server validates the token in the same manner described earlier and services (or denies) the request.

The following diagram illustrates the Code-Grant workflow. If you leave out steps 1-3 it depicts Client-Credentials.

This is a great resource to understand OAuth in greater detail!