Learn System Design

6. Harnessing Load Balancers for Web Traffic Symphony

Ben Kitchell Season 1 Episode 6

Send us a text

Imagine your website could handle the furious influx of a flash sale without breaking a sweat. That's the magic of load balancers, the unsung heroes of system design, which we dissect in this episode to help your services run smoothly even under the avalanche of high traffic. Together with industry experts, we unravel the mystery behind these traffic conductors, ensuring you grasp the significance of active-passive and active-active setups to avoid the dreaded single point of failure. We also pit hardware against software load balancers in an epic showdown, discussing how each fares in terms of performance, flexibility, and cost to help you architect a robust and scalable system.

As the conversation heats up, we chart the terrain of load balancing algorithms, where each choice can lead to triumph or turmoil for your server efficiency. Get ready to be enlightened on the intricacies of weighted least connections, least response time strategies, and how real-time data can empower your system to channel web traffic with the precision of a symphony conductor. We're not just talking about keeping your digital lights on; it's about fine-tuning your system's performance to handle the crescendo of demands. So stay with us, because next time we'll be navigating the complex waters of DNS and network management, adding another layer to your growing mastery of system design.

Support the show

Dedicated to the memory of Crystal Rose.
Email me at LearnSystemDesignPod@gmail.com
Join the free Discord
Consider supporting us on Patreon
Special thanks to Aimless Orbiter for the wonderful music.
Please consider giving us a rating on ITunes or wherever you listen to new episodes.


Speaker 1:

Hello everyone, welcome to episode number six of the Learn System Design podcast. One of the biggest lessons we learned when discussing databases is that you never want a single point of failure Because, to paraphrase Murphy's Law, if it can fail, eventually it will fail, and because of that you want to protect your system. And while databases have concepts of replicas, what is one to do if they want to replicate a service in an application? If you answered scale horizontally, then I'm very proud of you. You've been listening and that's exactly what we should do. The only problem that exposes itself is, if we do scale horizontally, we have our backend duplicated with a bunch of instances that are just the same thing. How do we direct traffic to them? That is where the subject of today's episode comes in.

Speaker 1:

The unsung hero of scaling an application, load balancers. From a high level, load balancing seems like a pretty simple concept. It is a system that could be either software or hardware, whose main purpose is to take a large number of requests or a large load and balance those requests across all available instances. And with this simple concept we help our services scale for both redundancy and managing an influx of a large number of requests. At a very low level. Load balancers are so much more than this. We're going to dive deep into all the parts that make them fascinating, but before we do that, I want to be sure to address a question that undoubtedly is cropping up, and that is how is a load balancer not a single point of failure? Or in other words, aren't we just moving the single point of failure from the application layer to the load balancer layer? And the answer is yeah, sort of. But while a load balancer is a single point of failure, they can also be redundant, and we can apply strategies to the load balancer to help with uptime. So much how we've replicated databases. We can also replicate load balancers. So how do we do that? Well, there are two primary ways to handle redundancy on load balancing. Each has their benefits and drawbacks, but the most important thing is they are vital when ensuring your system be as protected as possible from failures.

Speaker 1:

The first strategy is called active-passive. Where there is one load balancer, it's active and it helps spread requests across all of your services, and on the other side of that, there are any number of other load balancers that simply sit in what's called passive mode. The passive load balancers simply periodically check that the active load balancer is up and running and, if it's not, one of those passive becomes an active load balancer. It takes care of all impending requests that the active one should be working on. When that active load balancer that went offline comes back online, it simply takes its place as one of the passive load balancers and all is right in the world. This strategy gives us a huge boost when it comes to outages, because we always have that failover to protect us in an event of a failure. We didn't plan for. The other main strategy of load balancing redundancy is called active-active. With the active-active strategy, we have two or more servers that work together to route all requests. One of the biggest advantages of this setup is that all the servers can share communications about the requests that they have serviced. It makes it easier to cache those responses and then, if the same user comes back and requests, you can actually remember which server they were sent to and send them back. However, one of the biggest drawbacks is there is no failover, meaning if one or more of your load balancers go down, then there's an automatic slowdown in your application, because, no matter what, having one, two or even three load balancers go down means your application will be slower than it was before.

Speaker 1:

Let's talk about the two main types of load balancers. They are hardware and software. Which you choose is fundamental to the type of system you want to be responsible for. Ultimately, it will depend on the kind of system you want to design and the use cases you want to be responsible for. Ultimately, it will depend on the kind of system you want to design and the use cases you want to address.

Speaker 1:

Let's start first with software load balancers. They are usually an application that runs on a server, so they're very easy to integrate with your system and very easy to scale up as they are needed. Generally, they are less performant than hardware-based load balancers, but they're also more flexible and they're cheaper, on the other hand, because they can run on pretty much any server, including right next to your application on the same node. This also means that generally, they aren't as secure as hardware load balancers, which usually have their security baked in. Besides being better at handling higher load and having better security, hardware load balancers also offer a higher degree of availability. Since you control the hardware, you are responsible for the uptime. Since you control the hardware, you are responsible for the uptime. But, as mentioned before, these hardware choices are going to be more expensive and less flexible in terms of configurability.

Speaker 1:

When it comes time to actually build your system, the question is going to arise how do I know which one to choose? And, honestly, the answer is straightforward. When you look at your data and the use cases you have, are you going to be handling an exorbitant amount of requests. What I mean by that is are you building an application like Google or Amazon that's going to handle that level of traffic? If not, or if security is not one of your top priorities, then you should use a software load balancer. When it comes to how the load balancer decides where to send the traffic, the decision usually comes down to a load balancing algorithm. So load balancing algorithms are really the cream of the crop when it comes to how to implement a load balancer. This is the case for software and hardware load balancers, and when it comes to load balancing algorithms, they are usually separated into two distinct categories. Those categories are static and dynamic. Under the static category, we have a few algorithms we are going to talk about today. If you're caught up with your base data structures and algorithms, then some of these are going to sound really familiar.

Speaker 1:

Let's start with what is usually the most common static load balancing algorithm, which is called round robin. I want you to imagine, if you will, that you have a deck of cards and you want to give each of the three players in this imaginary game the same amount of cards. The natural process is to give card one to the first person, card two to the second person and card three to the third person. Then, when you get to card number four, you go back to the first person and repeat until all the cards have been dealt. That process, in its core, is the round robin algorithm. The load balancer takes all of the requests or deck of cards, then delivers each request to one of the available services in your system or players in the game by simply looping around and delivering each one by one. The benefits of round robin are in its simplicity and its theoretical balance of spreading the load. The reason I say theoretical here is that Round Robin doesn't take anything into account when sending a request to a service. If a specific instance of a service is handling, say, like a large request and it's next in line for Round Robin, the load balancer will send that request regardless of anything else. And what this means for your users is that any request that is unlucky enough to get that instance with the very large request has to wait, which can cause timeouts and generally a bad user experience. Round Robin is best used when you need to implement something easily or the service has a consistent response time.

Speaker 1:

Up next in our static load balancing algorithm discussion is a variant on the Round Robin strategy and it's actually called Weighted Round Robin. Let's go back to our discussion before with the three players and the deck of cards. What if you're playing with, say, a small child? You wouldn't want to give them as many cards as an adult, right? So instead you agree beforehand that of the 52 cards in the deck, you and your fellow adult player will both get 21, and the kid will only get 10. This agreement beforehand that dictates how many cards each player gets is where weighted round robin comes into play.

Speaker 1:

When the load balancer is given the list of server instances, it also receives a weight for each one of those instances. The weight dictates the number of requests the server can handle Because, as you might remember from earlier episodes, you can scale instances vertically to give them more resource power. Then you can link it back to the example above. Maybe we have two beef up instances with extra resources that we'll call the adults, and they can handle 200 requests. But maybe you have an extra small instance that's well you know, called the child here that can handle maybe half of that. When a request comes in for an instance that is currently at capacity, when a request comes in for an instance that is currently at capacity, the request can either be denied or enqueued for further consumption. The benefit of weighted round robin are its improvement on the based round robin system. It covers the big problem with round robin. Basically, the downside is that by fixing the use case of basic round robin it adds an extra layer of complexity when we basically have to adjust weights every time a server's resources change.

Speaker 1:

The last static algorithm I want to discuss is called IP hashing. For the final card analogy, let's say we're trying to give one person all the odd cards, the next person gets all the even cards and then maybe you take all the joker cards that come out. As you come across the cards you wouldn't just deal in cyclical order like the last two strategies. Instead you'd probably look at the card, check if it's even or odd or a joker and then give that card to the specific person that matches the criteria. Ip hashing is the same thing in the load balancing world. The load balancer receives the IP address of the client connecting it, then runs it through some sort of function and, based on what that function returns, it will assign a specific instance to that client.

Speaker 1:

Astute listeners might remember this concept in our episode about database indexes. Hashing is very important when it comes to software engineering and if you remember in that episode, you'll remember the benefits and drawbacks and rest assured they will be very similar to the ones expressed for load balancing. The huge benefit is consistency. If someone connects to a specific server and adds something to their cart maybe a retail website well, if they close the site and reopen it but they're on the same IP address, then that consistent output of our hash function will assign them straight to that same server every time. The biggest downside is hashing can cause hotspots on specific instances. Hotspots, if you remember, is where a server or specific instance of something gets more load than something else based on a bad hash. If you have a bad hash function that returns a certain number or a certain instance more than others, then you'll end up giving that server a lot more load than the others and it's not really balanced.

Speaker 1:

At that point let's talk about dynamic load balancing algorithms. Now. The difference in static and dynamic algorithms for load balancers is that static algorithms do not take into account the current state of servers before it decides which request goes to which server. So the state of the servers whether they have a lot of load or a little bit of load, or they're up or they're down does not matter to the static algorithm. It just knows you're next in line, so now you are getting assigned this request.

Speaker 1:

The first dynamic algorithm to touch on is the least connection algorithm, is the least connection algorithm. With the least connection algorithm, the load balancer will only check for the server with the least amount of requests currently processing, and nothing else. That's very important here. Only the least number of connections, nothing else. It's pretty straightforward. If you think about you're at a grocery store and you see a few lines to check out, you tend to go towards the line that has the least amount of people. Right, it's the same thing. The beauty of this algorithm is that it is one of the best ways to avoid overloading your servers. It always takes account the amount of requests that are being processed and chooses the server with the least amount. However, the problem with this algorithm is that it only takes into account the least amount of connections and nothing else. The least amount of connections and nothing else. So let's say we have a server that can handle 200 requests and it's currently processing 100 of them. It's only at half its load right. And then we have another server and that server can handle like 100 requests and it's processing 99 at the current level. With this algorithm, the load balancer will assign a new request to that smaller, more full instance Because objectively, one has 99, one has 100. Effectively, one has 99, one has 100. Nothing else matters to it. So it actually puts a server at full load rather than the server that's only at half.

Speaker 1:

The second dynamic algorithm I have a feeling you will pick up on pretty quickly. It's called the weighted least connection algorithm. The second dynamic algorithm I have a feeling you will pick up on pretty quickly. It is called the weighted least connection method and you guessed it. The weighted least connection method is exactly like the least connection method, but this time you can assign a weight to the instances and get better control over where the request goes. If we take the last algorithm into consideration now, when deciding where that new request would go, we compare that load to the capacity and then assign based on who has the biggest difference between current load and current capacity. So the bigger one that was only at half but it's running 100, and the smaller one that's running 99, but it has a capacity of 100, it would actually assign it to the larger server with more capacity because that difference is bigger rather than the actual number of requests. The drawback, much like weighted round robin that we talked about before, is that you still have to change those weights every time a server's resources change Up.

Speaker 1:

Third, we have the least response time strategy. The least response time strategy will actually take into account historical data for each and every server and then send the request to the one it thinks will actually accept and this is important process the request in the shortest amount of time. So it's not just which server goes through and will take the request the quickest. Instead it says, based on my experience with all of you, which one of you gets things done the fastest and returns a 200 to me to say that everything went according to plan. So if we think about the grocery store analogy again even if a line is shorter, if you see that the person working there maybe you've checked out with them before you know they love to talk and they love to chat, maybe they spend a lot of time bagging groceries and you know, doing all this extra stuff and that's great. But if you're in a rush maybe you don't want to go to that person, maybe the line might be a few people longer. You know two lines down, but that person in your experience is super fast. They're not going to have a conversation with you, they're going to get things bagged, they're going to get you out the door the fastest. That is the same concept as least response time.

Speaker 1:

So what's the biggest benefit of least response time strategies? The biggest benefit, honestly, is that your overall performance of your application will be faster, because if it always chooses the server that it thinks will respond the fastest, your request will always, or most likely, get processed in the quickest possible route, which makes your application quicker overall. On the other hand, because you're basing everything on historical data, that data could be biased and it could be wrong. Think about, maybe one of your servers, for the last week or so, has been dealing with one of your biggest clients, and your biggest client sends these huge requests that take a long time to process, and so it makes that server, which might be more beefed up and be very quick compared to the other servers, it just handles a larger load. Lately it's going to look like it's a slower server. So when you get something small in, it's going to look and say, well, server number one here is quicker than server number three, despite the fact that the third one might be faster. It just depends based on your historical data for these servers.

Speaker 1:

And so the last strategy for load balancing algorithms I want to talk about is the resourced base strategy, and it's a little different than the others. For this one you'll actually deploy agents onto your servers, and these agents are basically little specialized jobs that run periodically, and they'll return information to your load balancer about your CPU and your memory and your storage and your capacity and everything like that Basically all the things that a load balancer might want to take into consideration before assigning a request. And before each request, that load balancer will basically send a query out to each and every one of your servers and it'll say server one, server two, server three give me back all of your information for your state right now. Then it simply chooses what it thinks will be the best in terms of handling the load and getting it done the quickest. The benefit of this is that your request will always go to the best server. It will always go to the one that has the most resources available for the size of your request, and the downside is that you have to account for that extra time. Right like it, it's going to take time for that load balancer to query every single server. So if you have a lot of servers and maybe some of them are down, you're gonna eat up that time waiting for that response back to ensure that it has each capacity. So while it will get the best server for the load, it might not be the quickest in terms of actually processing everything for the user level. When it comes to choosing the perfect configuration for a load balancer, it can feel like a daunting task, but after this episode, I hope you feel more comfortable with all the tools at your disposal. After this episode, I hope you feel more comfortable with all the tools at your disposal Because they end up being your first line of defense when it comes to scaling your application. I want to make sure I don't understate the importance of them here. Make no doubt about it horizontal scaling would not exist without load balancing in everything that they do not exist without load balancing and everything that they do. Next episode we will be diving into DNS and all that comes along with. You know handling networking and things like that.

Speaker 1:

If you've been enjoying these episodes, please share them with your colleague or your friends. If you would like to suggest new topics or even be a guest on the podcast, feel free to drop me an email at LearnSystemDesignPod at gmailcom. Remember to include your name. If you would like a shout out and if you would like to support the podcast and help me pay my bills, please jump over to our Patreon. Consider becoming a member. Just give me ideas, for maybe you know right now it's not worth it, but if I did something else, if I put something more important up there, that you might be considered making a dollar a month donation. Yeah, so all podcasts are inspired by Crystal Rose. All music is written and performed by the wonderful Aimless Orbiter. You can check out more of his music at soundcloudcom slash, aimless orbiter music. And with all that being said, this has been and I'm scaling down. Thank you.

People on this episode