Learn System Design
A bi-weekly podcast hosted by a senior engineer named Ben Kitchell that takes a deep dive into learning about technical system design by learning together. Each episode we will explore the inner workings of what makes these systems so complex and fascinating while building on our knowledge of how they came together.
All music written and performed by the mysterious Aimless Orbiter. You can find more info about him and his music at https://soundcloud.com/aimlessorbitermusic
Learn System Design
Mastering System Design Interview: Unlocking API Design, Crafting Routes, and Real-Time Data Transfer Techniques
Unlock the secrets of API design and elevate your system design skills with our latest episode featuring me, Benny Kitchell. Explore the pivotal role APIs play in system design interviews and real-world development, where they act like the seamless communication between waiters, cooks, and customers in a restaurant. Learn how to craft APIs that are tailored to both internal and external developers by understanding their specific needs and objectives, ensuring a smooth and efficient user experience.
We also shine a light on the critical aspects of designing API routes. Understanding user needs and addressing core problems are the bedrock of effective API design. By focusing on functional and non-functional requirements, you'll be equipped to create API routes that meet real-world demands. Discover the importance of API versioning through our Spotify example, where future-proofing your design becomes crucial in maintaining user satisfaction and facilitating seamless updates.
Finally, we delve into the world of real-time data transfer, examining both synchronous and asynchronous communication methods. From the traditional request-response model to the innovative use of WebSockets for instantaneous data exchanges, we break down the strengths and limitations of each approach. Equip yourself with the knowledge to choose the best method for your client-server interactions, ensuring your system design is robust, flexible, and ready for any challenge.
Dedicated to the memory of Crystal Rose.
Email me at LearnSystemDesignPod@gmail.com
Join the free Discord
Consider supporting us on Patreon
Special thanks to Aimless Orbiter for the wonderful music.
Please consider giving us a rating on ITunes or wherever you listen to new episodes.
Hello everyone, welcome to episode number 10 of the Learn System Design podcast with your host, me, benny Kitchell. This week we are going to continue down the primer of building a system and learning system design interviews. The episodes are geared more towards taking a system design interview but again, you'll find at most tech companies, especially if you're going in as a senior candidate, that these apply also to building out actual systems as well. If you haven't yet, I do recommend listening to the last two episodes episode eight and nine before listening to this one, because it's all sort of one giant covering of all the system design interviews and the things to take into account when designing a system, those sort of important steps, and this is sort of the next step into that relationship. Anyway, let's jump right into it.
Speaker 1:The main topic I want to cover today is API design or API modeling. This subject can sometimes honestly be its own interview at some companies because it's very vital to your day-to-day operations. Not just you know can you do this thing to your day-to-day operations. Not just you know, can you do this thing? Where building an entire system from scratch usually isn't a common thing at a job, building out a new API for a service can be and today I'm going to go into it as if you were given a full sort of API design interview and along the way I'll sort of highlight the important parts for system design and you know, all of this is really good knowledge to have when going into a system design or an API design interview. But the really key parts I will highlight to make sure that you know it's obvious the things that if you're just doing a system design interview, you don't want to spend too much time on the intricate details. The big question I guess at the top is what is API design really for? What is it actually supposed to accomplish as a whole? Well, the entire point is simply designing an efficient way for your services to communicate. To give you a quick refresher, apis are just the interface on which the services communicate and their protocols are sort of how they communicate.
Speaker 1:If you think about a restaurant and boil it down to just like the cooks, the waiters and a customer, imagine the customer is a browser, right, and the cook is some sort of back-end service that does something and sends back food. It doesn't really matter. The waiter itself is sort of our API and how that waiter communicates might be different depending on the methods or the protocols that you use. And if you think about it in terms of an actual restaurant, perhaps it's better for the cooks if they just get a ticket that's written down and has everything that they need to make and just hand it to them or put on, you know, like a belt that they can spin around and take off. But, on the other hand, a waiter wouldn't ask the customer to just write down on a piece of paper what they want and then hand it back to them. That would feel silly, right. That wouldn't make sense. No, instead, they take your order and they might write it down themselves, or they might keep it in their memory and then write it down later on a piece of paper and give it to the cook or you know. Whatever the process is, that's best for them. But consistently, every customer is going to have the same experience. They're going to tell the waiter what they want, the waiter will go back and the waiter will communicate with the cooks in the exact same way, consistently, and that's sort of the important thing here.
Speaker 1:If we apply this to our scenario, it's the same thing, right? The browser requests some sort of data using some sort of protocol. The API takes the request, processes it in some way and then sends it to the cook or the backend service. In the example above, they just write it down and it works sort of like a message queue, right? If the cook is the service, then that belt that holds those tickets, that's your message queue in this case then that belt that holds those tickets. That's your message queue in this case. Once the cook actually receives the ticket, it will actually process what's on the paper and then it'll send back some sort of response or, in this case, the food.
Speaker 1:I hope this analogy is working and makes sense, right? This is just one example of how an API might work. Not all API designs will have a message queue, but sometimes they're beneficial and it just depends on the design, and there are many different design patterns when it comes to designing an API, and the one you choose is important, based on the data you're handling and, honestly, the ultimate goal of what your system is trying to do. Each one of these patterns has its own place. None of them are one size fits all solution, and it ultimately depends on these next big questions that you should always ask yourself whether you're taking a system design interview or taking an API design interview. Those two big questions are simply who is this API for and what should this API do? I know these questions feel obvious, but trust me when I say it's very, very easy to start building and then realize halfway through that the user who needs to consume this API is going to have to do some extra work on their end to match your system or, even worse, they're not going to get the data that they want at all or they're not going to get it in the format that they expect, and that's a terrible experience, no matter who is using your API.
Speaker 1:So who is the API for? It could be that the API is for internal developers, the people you work with at your company. Maybe you know someone who works in a different pod in your company, or something like that. The routes might not even be exposed to the outside world, they might be completely private. But it could also be for an external developer, maybe a hobbyist at home, maybe a developer at a different company. It just depends, right and honestly. Maybe someone is using your API to build their cool new tool on top of your data and on top of your API, and in that case you might want to make it authenticated or rate limited right. Like you know, you look at companies like Reddit or Twitter that are having you sign in before you can see things now, because AI is just scooping up all that data for free and then using it to build, like their LLMs and things like that right, and so this is a case that a lot of people didn't think about before, but now these companies are having to think about do I rate limit? Do I need to authenticate in some way? That way, people aren't just scooping up this precious data that I have and using it for their own tool without me getting a slice of that pie, for better or for worse. That is the world we live in right now.
Speaker 1:So these considerations are what come up when you start thinking about who consumes this API. It'll help you influence both how the data is structured and then also how the request should be consumed and how the data is protected. So what should my API do then? In other words, what problem is my API solving? Because if the API is to set up analytics for how many people have listened to a specific song in the last day and your API route doesn't even return analytic data. Or you have a play song request. If it doesn't take into account the user that's playing it, maybe it's not giving the correct data. If one person listens to the same song a thousand times and listens to nothing else, is that more or less weighted? These are things that you know. Your company might decide on the backend, but it's a part of your API. It's the part of the thing that you're going to be returning to users and saying, saying this is a source of truth. This you can take to the bank as real analytical data, and so it's thereby vital to understand the problem you're solving and ensure you are building the API to reflect again who is using it and what are they using it for and what are they using it for Now, for those of you who are focused in on the system design portion, you might remember the last two episodes.
Speaker 1:We talked about functional and non-functional requirements. Those are back again. Again, they are very, very important. You're building your entire system based on those things. Your API routes in a system design interview should be direct reflections of your functional requirements. If you're building a system like Spotify, one of your core functional requirements is going to be users should be able to play a song, right? That seems very intuitive. Well, one of your API routes should be your core path URL, followed by slash v1, slash songs, slash ID. That makes sense, right? If not, it's totally okay, because we're going to talk about it now.
Speaker 1:First, let's talk about your core API path. It's pretty easy to follow once I explain what I mean. Your core API path is just the URL for your API server, or maybe your API gateway or your load balancer, something like that. But in reality it's going to be the request that goes to your API server. If you're at a company like Spotify, your route might be Spotifycom or APIspotifycom or some combination of those things. Maybe go check out the episode number seven on DNS to get more insight into how that really works. But at the end of the day, it's going to be some URL. My suggestion personally is I like to do apicompanycom or whatever your company URL is, because it's consistent and it's easy to implement for developers.
Speaker 1:So next, in that sort of line of things, is that slash v1. So why v1, right? Probably pretty easy to guess, right? In general, v1 is just used to dictate that this is the first version of your API. It could also be V2, v3, or V100, right. V just means version and the number signifies which version of the API that you're using, which version of the API that you're using. The versions are important because it helps provide foresight and flexibility for future implementations of your API.
Speaker 1:Let's keep with the Spotify example that we talked about earlier. Let's say you're just starting out building Spotify as a company and you built this great starting point for your API. Your response is awesome. And maybe you implement a slash get albums route and that route simply sends back information based on a specific route and you think well, you know some albums have different versions. You know, maybe they've been remastered or re-released so they they've come out in different years. Maybe they've been remastered or re-released, so they've come out in different years, right.
Speaker 1:And maybe your idea is to take that album and then just add the year at the end, right? So if it's Michael Jackson's Thriller album, it would come back as Thriller parenthesis, 1982. Then maybe a few years pass, right? You have a lot of internal developers that are parsing that string for data to use in analytics and you have external developers parsing that string to maybe come up with the best songs of a certain decade. And then you realize, oh man, you know what would be easier? We could just pass the date right, you know, in a new property, so all of these people wouldn't have to parse this year out of this album name. That seems like a weird thing. Right like hindsight's always 2020, and right in that moment is where the problems will begin, because now you have people who are using and consuming your API. They're parsing that string and they're going to start getting requests back without the year in the place they expect it to be.
Speaker 1:You can send out as many emails as you want. You can make as many announcements as you want. You can talk to as many people as you want. You can make as many announcements as you want. You can talk to as many people as you want. I guarantee you you will never have everyone on the same page. It's just impossible when it comes to cross company or even cross developer communications, when you have too many people consuming something and, honestly, there's also going to be people who say I love that that year is in that stream. That makes it so much easier for me. I don't want a new property because it just is. People have their own opinions and people have their own feelings towards things, and if you just change that response, you're effectively breaking anyone's system that didn't upgrade in time, and even if that's one person, you're now creating a bad developer experience overall.
Speaker 1:And instead, if you put the version in the route, you can simply just add a new route that says slash v2, slash, get album, followed by the ID, right. It responds with the year in a separate place. You can let the developers and other companies upgrade at their own pace or not upgrade if they choose not to. And if you didn't put that version in, you're forced to basically create another route and have a new naming convention, another route and have a new naming convention. And so where you have slash albums, slash ID, now you need a new route for this new concept. So you might call it slash new albums, but then you can't use new albums for any albums that have come out in the last week or so, right? So now that has to be something different, and if you want to change the albums route again, then you might have something like slash new albums, final right.
Speaker 1:If you've ever worked with Photoshop or ever tried to save a draft in working on a high school paper, you know this loop of final, final, final for real, final, you know these sorts of things, and it just keeps going, whereas if you just add that version, you can have as many versions as you want and they're completely separate and they're still logical, because it's easier to understand. Slash V1 albums, slash V2 albums over. Slash V3 new albums, final, slash new albums, final, I promise, right, and the final part of our slash V1, slash songs or albums or what have you slash ID is that ID at the end, right? I do want to make a note actually here, before moving on to the ID, the resource that you're using, so like songs, albums, users, analytics, these things, um, it's kind of an anti-pattern to not name them uh, pluralized and I might get emails about this and that's fine.
Speaker 1:But in my opinion, I guess when you do slash V1s, slash songs, slash ID, right, I'm saying we're using the first version, go, get all the songs and find the one with this ID, just like, if I want all of the songs I might do, slash V1, slash songs, and then again I'm saying get me all the songs I might do, slash V1, slash songs, and then again it's I'm saying get me all the songs, right, and that to me is important, because the idea of get a song and then it returns more than one thing at any point doesn't make sense. Or you're filtering through one thing to get one thing and for me me it just doesn't make sense. So I wanted to include that and make sure I talk about that. Um, for me it's an anti-pattern, but it's not a hard like you failed because you didn't do this. I just wanted to explain why I say songs or albums or analytics or users. Um, yeah, and then the last part, as we talked about, is the ID, and that's just a direct signifier to your resource that you want, right? So if you include an ID, that ID should be unique period. No other thing should share a unique, because every single unit of something in your system, no matter what, should have one single unique ID attached to identify it. We talked about this a little bit when we talked about databases and how we store things, and I will talk about it in even more detail on the next episode when we talk about DB modeling and how that works.
Speaker 1:Now, when it comes to API design patterns, we talked about before that there are more than a few and there is no one-size-fits-all solution. Those patterns go by multiple different names and things like that, but I'm going to try and use the most common names for each one. Names and things like that, but I'm going to try and use the most common names for each one. If nothing else, they should be Googleable and you can sort of you know, tie things together if you find, if you come across one that I haven't talked about here, and those are simply the request response, the sync, async, the short poll, the long poll, the web socket, the server-side event and, of course, the pub-sub pattern. I want to talk about each one briefly, sort of the ways that you would use them, and then also, maybe, ways where they're not so great. Starting with the request-response, it's currently the most common pattern and what I used for all of our examples above.
Speaker 1:It's most commonly used when it comes to web development, in particular the sending of data between, like a browser and a server. It is only useful when the client needs something and the server has it. If something changes on the server and the client needs to know about it, there's no way for the server to tell the client that something has changed. When using this pattern, it is inherently synchronous, meaning it expects the server to respond promptly with some data and it's not going to wait around and it's just client calls server, server sends back data. If the data changes frequently or if the request initiates some sort of change, that might take a long time. For example, it might set off some calculations with a bunch of different services that could take hours or days. Then we have more of an asynchronous setup where the client sends the request, the server responds with an initial success saying hey, yeah, we got your message, great. And then in the background it actually performs a lot of these tasks that take longer. And so the problem then becomes when the server's done, when these changes are applied, or the server needs to tell the client something has changed. It doesn't inherently have a way to tell the client hey, you need to call me back because things are good now, right, like things have changed.
Speaker 1:And so the first solution to that problem we would look at polling, and polling comes in two distinct flavors there's short polling and there's long polling. Short polling will be familiar for anyone who's ever dealt with a small child before right. The short poll says it will keep asking for new data and the server will respond immediately, and either the the server will respond with an empty response, meaning nope, nothing's changed here, or it will respond with the changes that happened. You can imagine, um, short polling is useful for, like, uploading a video to your youtube channel. You know that percentage bar that sort of says hey, 25 uploaded, 26, that sort of thing. That's just short polling at work.
Speaker 1:And the main issue with short polling is probably already obvious. It means we send a ton of requests to our server over and over and over. Each time we are stuck waiting for that server to respond. So if that server gets overloaded and you're doing short polling, you're just overloading it more and you're still waiting for the response. So it has to take care of all those other responses before it even gets to you. So now you're just deadlocked.
Speaker 1:On the other hand, you have long polling. It sends a request and the server waits until there's new data before sending a response. So compared to short polling, you have a lot less requests coming in, but also it introduces a bit of latency and you have some extra logic on the server to actually hold those connections. So short polling is hey, is there an issue? No, okay, hey, is there an issue? No, okay. Whereas long polling says hey, is there an issue? No, okay, hey, is there an issue? No, okay. Whereas long polling says, hey, is there an issue? And then the server waits and holds that connection and then, as soon as there's an issue, it says yep, there you go. There's the problem, right, and it might not be an issue. It might just be user created or something like that. Right, but that's the scenario. User created or something like that, right, but that's the scenario.
Speaker 1:The benefits you get from long polling is that the response can be near real time and your client is already primed and ready for the response, right? It initiated the call, so of course, it's ready for that response to come back. If, for some reason, the latency is too much or you just need a more real time solution the latency is too much or you just need a more real-time solution you can always rely on WebSockets. And don't get it twisted WebSockets are 100% real-time transactions. The entire concept is a client calls a server and it says give me one of your open ports. The server responds with an open port and then they have a direct connection. Ports. The server responds with an open port and then they have a direct connection on that port and they can send each other data, so the client can send data to the server. The server can send data right back to the client in 100% real time, directly to each other.
Speaker 1:One of the scenarios I actually worked with WebSockets I used personally when working on a mobile game where we needed to know the position of other players, maybe the items that other players have used. You know things like that, right, but we needed to know it in real time. This is a multiplayer game and we needed to actually reflect and show these changes on the client for all of the clients that were in the same game. And, of course, the one drawback of this is that the client might not be ready to accept all the data you're sending to it, right, because that connection is always on. You never know when the server is going to send a ton of data your way and whether or not your client is ready for it, it doesn't care.
Speaker 1:So imagine a lot of events just occurred in game. Maybe someone got a bunch of kills, they set up an AoE thing or something like that, and on your client side, you also set up a mass AoE and you just killed a bunch of things or something like that. Right, all these big events that just happened are being sent to you from the server and then also happening on your client, and your client has to perform all of these things. And that surge of data means your client could honestly max out on its resources. You're going to get stuttering, you're going to get frame drops, everything like that. Right, because all of these things are happening in your phone or your computer. Whatever the client is only has so many resources that it can process these things.
Speaker 1:And I can almost hear you guys now crying out but Benny, what if the client is just a random viewing machine? It doesn't need to interact with the user at all. Right, I just want to display some data when the data changes. I just want to see that data change. And the simple answer is if the user doesn't have any input and it's just a vessel to display something, then in that case you might be better off using something like server-side events. In that case you might be better off using something like server-side events.
Speaker 1:Server-side events are 100% real-time, just like WebSockets, but instead of client and server sending data to each other, server-side events are treated as just the server updating the client. It's unidirectional, always comes from the server to the client. The client never needs to tell the server anything, but that connection stays alive. These sort of patterns are useful for like news sites or stock prices, so new data comes in. It updates the display without the user needing to interact at all and, unlike WebSockets, server-side events are handled a lot like polling, where the request is just long-lived right. You're not actually taking up a port, it's just a request that stays alive, and the client can actually subscribe to what are called stream events, which should not be confused with PubSub events.
Speaker 1:Publisher and subscriber or PubSub events go through message brokers to help deliver notifications for specific events. They're similar to sse above, but the client and server can both be a publisher and a subscriber, right. So where's the server before? Always sent messages to the client with a PubSub? The client can send messages to the server. The server can send messages to the client, but server can also send messages to other services, or the client can trigger events that trigger different services as well. Well, instead of it being a unidirectional relationship, if we go back to our mobile game example, where we use web sockets to help our data flow through the system, the PubSub setup is better when there is no direct connection needed and just needs to consume an event.
Speaker 1:An example would be maybe you added a new map and it wants to let the users know hey, this new map is out and you should check it out. Right, the server, in this case, would be a publisher. It would publish the new map created event and then the client in this case of subscriber would consume that event and then send like a push notification to your device. But I also want to lay out the fact that another service could also be a subscriber. You could have something like an email service, right, the email everyone that the new map is out and with that one event you're triggering a push notification on a phone, an email or whatever you want to use to notify someone about something. And although these patterns can seem arbitrary to how your system actually works, in reality it directly influences the system you are building. If you know how your data flows through your system, then you know what your system should look like.
Speaker 1:Imagine instituting that notification service without using PubSub. Yeah, you could do it with something like long polling, but it's less straightforward. It takes a lot more effort and resources on your client's machine. Imagine your phone constantly checking to see whether or not Fortnite has a new map or something like that. That seems insane, right? Instead, fleshing out your API design, understanding how the data flow happens, means all you have to do is piece together the tools you need and you can then speak intelligently towards the data in your system.
Speaker 1:I know I mentioned before that this episode would be covering API and DB design, but with the amount of information I wanted to squeeze in, it was just not possible to fit the database modeling in this episode. Next episode I will talk about DB modeling, how it works and how it's actually hand-in-hand with your API design. I do want to make sure to make clear that API design and DB modeling, or API modeling and DB design, those sorts of things are sort of hand-in-hand and they can be done one before the other or vice versa, because in reality they very much influence each other. Another quick call-out is I fixed Episode 1's audio that music should no longer be heard on Episode 1, so if you want, you are welcome to go back and listen to that and hopefully be able to hear it better this time. Thank you for everyone who gave me that feedback. I know it's been a long time coming and I'm glad I was able to actually improve that. So, yeah, if you had trouble hearing me on episode one and you are still interested in that information, please go back and check that out.
Speaker 1:I do also want to thank everyone who reached out by email. Leah Whitelaw, you had some great feedback and excellent questions and you actually helped influence some of the examples in this episode, mostly to do with Spotify, so thank you very much for that. I also got some amazing fan mail on the website and I really do appreciate it. I do want to do a call out to say, if you do leave fan mail on the website, if you have specific questions, things like that it doesn't give me your name or your email or anything, so I don't really have a way to respond to you. So if you just want to leave some, love the website, the fan mail is great. If you do have questions, please reach out to me, or reach out on the Discord as well. Shout out Roman Toraz, gamesply, beerx for actually holding down the Discord. You guys are awesome.
Speaker 1:Also, big thanks to Tom for his discussion on how to better design YouTube, and we had a really good discussion on that. So thank you so much for that, tom. And finally, the biggest thanks to everyone on our Patreon Sincerely, such a big thank you. You make it a lot easier for me. To everyone on our Patreon sincerely such a big thank you. You make it a lot easier for me to work on this stuff. Leah Whitelaw again thank you.
Speaker 1:Lucas Anderson, kalyan Desika, sergey Bugarra, anil Dugar, jake Mooney, charles Cazals, eduardo Muth, martinez, waho Sores, adrian Couric, gabriel just Gabriel, jesper Klotzel, nathaniel Sutton, florian Breider and Klaviy Ness for being a part of the Patreon. You guys mean so much to me and if I am pronouncing your name wrong, even in the slightest, please email me or send me a Patreon message or anything with a recording and I promise I will make it better and I'm doing my best. I promise I did also post a poll over there and it looks like the frontrunner for the very first interview design deep dive is going to be on how to design a web crawler. After that, we'll probably work on the parking lot reservation interview. We'll probably work on the parking lot reservation interview and if you want to be a part of influencing what I cover on the next episodes, please go to our Patreon and check that out. I will also be dropping the first authentication video just for Patreon. In the next month or so. I'm working on getting that fleshed out, researching that, and after that I'm actually going to talk about graph databases.
Speaker 1:Uh, thank you, leah, for that feedback and uh, yeah, if you, the listener, are enjoying these episodes, please share with your colleague or your friends or anything. It's a very easy way to help get the word out and helps more people, uh, listen and helps, you know me actually be able to get more attention on the podcast and maybe you know supplement. You know some of the costs of hosting and things like that. If you would like to suggest anything new topics or even just have any questions or be a guest on the podcast, feel free to drop me an email at learn system design pod at gmailcom. Remember to include your name if you'd like a shout out. If you'd like a shout out. Um, if you'd like to support this podcast, help me pay my bills, jump over to Patreon. Consider becoming a member. Um, all podcasts are inspired by crystal Rose. All music is written and performed by the wonderful aimless orbiter. You can check out more of his music at soundcloudcom slash aimless orbiter music. And with all that being said, this has been and I'm scaling down Bye.