Learn System Design

Mastering System Design Interview: Essential System Design Interview Principles and Techniques

Ben Kitchell Season 2 Episode 1

Send us a text

Can a simple delay really cost a company millions? We kick off season two of the Learn System Design podcast by exploring this and more. I'm Benny Kitchell, your host, and after a refreshing hiatus, I'm excited to bring you a fresh take on system design interviews and real-world applications. We start with the fundamentals of functional requirements using a relatable example: a music streaming app like Spotify. Discover how to align core functionalities such as song playback, playlist creation, and music recommendations with stakeholder expectations, setting the stage for effective system design.

This episode also delves into the intricacies of caching systems, the critical role of TTL (Time To Live), and the balancing act required by the CAP theorem. We address the importance of understanding both functional and non-functional requirements, emphasizing stakeholder input to ensure a robust design. Key concepts like latency, durability, and partition tolerance are unpacked, highlighting their impact on user experience and system stability. Tune in to gain valuable insights that will not only prepare you for system design interviews but also enhance your technical prowess in the field. Thank you for joining us on this journey; your support means the world!


https://www.cloudflare.com/learning/privacy/what-are-fair-information-practices-fipps/

https://mashable.com/article/myspace-data-loss

Support the show

Dedicated to the memory of Crystal Rose.
Email me at LearnSystemDesignPod@gmail.com
Join the free Discord
Consider supporting us on Patreon
Special thanks to Aimless Orbiter for the wonderful music.
Please consider giving us a rating on ITunes or wherever you listen to new episodes.


Speaker 1:

Hello everyone, welcome to episode number eight and the first episode of season two of the Learn System Design podcast. With your host, me, benny Kitchell, I've been handling a lot of mental health, a lot of physical health, but I'm back on the grind and ready to pump out a bunch more episodes for you, for everyone who reached out either through email, on the Discord or on the Patreon, I greatly appreciate you. If you supported me on Patreon, you should be receiving this podcast early. For everyone else, you'll be getting it a week later. For anyone who previously subscribed to my Patreon during the break, you'll also be getting two episodes for every dollar that you may have donated. So even if you only donated a single dollar months ago, I'll be backfilling and making it up to you and you'll still be getting episodes early, even if you stopped donating. Yeah, so hopefully that makes sense and yes, you will still get a call out at the end of the episode. You guys rock and I appreciate everyone so much for everything.

Speaker 1:

For this season, I want to take a new approach on not just how systems are built, but also personally tackle system design technical questions, the ones you get in interviews and why I tackle them in a certain way, the steps I take, my thought process, the questions I like to ask and sort of find what works for everyone. So hopefully this is beneficial for everyone. These first few episodes will sort of be a primer for the system design interview. Instead of diving straight into a technical, I want to talk about some of the questions I ask, some of the thought processes I have whenever working on an interview and what I expect when I conduct these sorts of interviews with people at different levels. I do want to be clear, though it's not just for killing an interview right. These concepts will help lay the groundwork for how to scale a system programmatically, even after you've landed the role. In the next couple of episodes for the start of the season, we're just going to be going through each step in the process, the questions you should ask, the main categories these belong in this episode in particular. We will start with functional and non-functional requirements. Then in the following episodes we'll touch on like capacity estimates, entity management, api design, data flow and then the big thing that everyone thinks about when they think about system design, that's those blueprint diagrams that you know, you usually see and you know for that episode there may be a case where I post some pictures that you know you can look at and sort of follow along with, or maybe I'll put up a YouTube video showing the whole process and then you can listen back on the audio if it helps. Yeah, I want to make sure to help everyone in the best way. So if you have any feedback about how you'd like to see that, definitely let me know.

Speaker 1:

So when building a system any system, whether for an interview or for a new product the first thing you want to tackle is exactly what the system needs to do. I know that seems obvious, but when you're nervous or stressed it's easy to blow past this and realize halfway through that you don't even know what you're building. And this process of understanding what the system should do is basically referred to as getting the functional requirements. Functional requirements are usually a bare list of what the system needs to do and how it behaves in certain situations. In an interview, I like to limit myself to around three or four functional requirements that usually you'll go back to later throughout the interview, although in a more real world product you could have even more than this. But even then I like to try and keep it to the minimum viable product or MVP, and keep those at really no more than five.

Speaker 1:

Once the MVP is complete, you can add more features to the product based on ask from customers or internal polling or you know whatever you think might make the product better itself. For this instance, let's just keep it to three items on this list. These items should be the most important features that always start with a similar phrase. If you're working on a product that will be used by a user, that phrase will be something like the user should be able to. If you're building something that is more on the back end, that like a client would interact with, it would start with something like the client should do.

Speaker 1:

If, for instance, we are building like a music streaming app, something like Spotify, and we had to create that list of functional requirements, what are the big three things that come to mind when building that system? In reality, there's no wrong answers, but there are answers that the interviewer or the stakeholder, the person you're building the product for, have in mind, right? So you might say the user should be able to play a song, the user should be able to create a playlist, the user should be able to get recommendations on new music, and so, with this, we've effectively just defined the core parts of what Spotify does, without worrying about being bogged down by all the extra things it does. From here, we can use those three functional requirements as the core of the product, the core of the system we are building, and focus on implementing only those three things. So that's all well and great, but why are we doing this? Yes, I mentioned that it helps us have specific features to focus on without getting bogged down. But what does it actually do? It actually allows us to understand what the stakeholder wants from the product.

Speaker 1:

Because let's imagine in a world where someone comes to you and they just say I want a pizza, you might think your first steps are oh, prep the dough, put on the sauce, start sprinkling the cheese. And that's the mindset I want to, sort, of course, correct within you. Instead, I want to get you to a point that when someone says I want a pizza, you, instead, I want to get you to a point that, when someone says I want a pizza, your first reaction should be what kind of sauce, what kind of cheese, what kind of toppings, etc. Because when you're building something, anything for someone, it's imperative to understand exactly what they want and how they want it. The real prep is not laying the dough in the sauce, but instead it's ensuring that you know everything that they want on the pizza and that they get the greatest satisfaction from the food. And that exact same thing applies to building a new system. Knowing exactly what a stakeholder expects and what they want will save you headaches and problems in the future, because you don't want to be halfway done and realize that, oh, this system I'm building isn't going to actually work and do the things that I want it to do.

Speaker 1:

In system design, you want to ask what's called targeted questions. This lets the stakeholder, whether it's an interview or a boss, know that you're actually giving thought to what the system is and you're building what they want and what they have in mind. In the event where maybe you have never used Spotify, this also allows you to get insight into what it is and what it should be doing. Some of the most popular targeted questions are does the system need to do X and what would happen in the event of X, where X in these instances are some sort of behavior that the system performs when acted upon? Taking it right back to the Spotify example, one of the questions might be does the system need to support playlists? This answer would give you direct functional requirements that you can add to the list. Another might be what would happen in the event that the user listens to a song? This would create a functional requirement of the recommendation engine that we talked about before. The user listens to a song. Yes, all well and great, but what happens on the backend? It goes to this algorithm that is a recommendation engine that is already built, but how would you serve that back to the user? Another quick example from the other side.

Speaker 1:

If you're not working on something that is user-facing, but something more that lives in the backend, let's say someone asks you to build a cache Maybe you don't know what a cache is You'd ask first what does the system do at a high level, and you would understand. It's just a simple key value store that helps return things quickly, right? Instead of fetching from the database, you fetch from the cache, and this answer is already given you two functional requirements, right? It returns things and you're able to insert things. So the clients should be able to insert items and the clients should be able to read items.

Speaker 1:

A good tip here is just to think how would this thing break right? When does a cache become useless? Well, a cache becomes useless whenever it has old, outdated items, because it doesn't help if it returns things quickly if they're not the right things. So how do we fix that right? So you might ask well, this breaks if we return outdated information. How would we, you know, handle this? Or it might be okay to ask yourself that question, and you know. A simple answer is well, you have a TTL, a time to live right, some sort of expiration on the items. And then there's your third functional requirement. A cache takes items in, it reads items and returns them, and it has expiration on items and removes them when they're done.

Speaker 1:

After you finish choosing your functional requirements, give a quick check to the stakeholder and make sure these make sense, right? You can just ask them hey, do these three things? You know, are they in the scope of what you're thinking or what we're aiming for? And if so, they'll give you a thumbs up, and if not, they might direct you towards some other requirements. Like, oh, I was expecting the Spotify player to be able to follow other users, so you'd maybe remove the recommendation engine and put in that, and then there you're both on the same page of what the system is that's being built, and from there you can start tackling the non-functional requirements.

Speaker 1:

If functional requirements are described as what the system should do, non-functional requirements describe how the function should be. While this distinction might not be clear at first, just give me a few minutes. I'll help you wrap your mind around it and even then, if it's still hard to understand, send me an email, send me a message on Discord, twitter, wherever you can find me. Please just reach out. I want to make sure that this stuff really sinks in and makes sense. Reach out. I want to make sure that this stuff really sinks in and makes sense.

Speaker 1:

Functional requirements, as talked about before, are how the system behaves when acted upon. It uses example phrases such as the user should be able to, the client should be able to, whereas non-functional requirements are determined by how the system operates, regardless of interaction. It usually is preordained with the phrase. The system should be Just like functional requirements. I want you to choose three to five relevant attributes to focus on with the system. If this is like a 45-minute interview, just stick with three. You're not going to get brownie points for doing four or five. You can always speak more to other things you would talk about later in the interview at the end, usually in the deep dive. However, functional requirements, as we talked about before, usually come from information gathering from the stakeholder. It's something that they know, that you want to get out of them using relevant questions. Non-functional requirements usually come from a list that I will be sharing with you now that you can sort of choose based on the data that you're using.

Speaker 1:

Now, if you've listened to my older episodes, then you've heard me spread this gospel of knowing your data and how it's imperative to building a good system. Luckily for you, in this instance I am going to help you. I have these memorized and I want you to memorize them as well. You don't need to remember all the key details, but just remember the high level idea of them and then from there you can practice mapping this to different sorts of data. So the first one, and arguably the most important, should sound very familiar to you. It is one of the biggest topics covered on our first episode, and that is the CAP theorem. I have had a few complaints that my first episode is a bit hard to hear, but this content is very important. I covered a lot of very important topics. So if anyone is having trouble hearing it or they just can't get it to sync in, shoot me a message and maybe I'll make some time to re record that first episode and put it out for everyone.

Speaker 1:

But for now, as a quick refresher of what the CAP theorem is, it simply states that every system can only satisfy two out of the three following concepts consistency, availability and partition tolerance. But from the side of the CAP theorem, when it comes to systems and not databases, partition tolerance in this environment, in the current stage we are in in system design, is inescapable. It is a must at all times. So, luckily for you, when building a system, you can easily say well, we're building something with microservices that scales horizontally, so we have to have partition tolerance. Something with microservices that scales horizontally, so we have to have partition tolerance. From here, you can look at your data and say does it need to be consistent or does it need to be more available to understand why partition tolerance is a given, no matter what is to understand. Again like thinking about Spotify or how we talked about it in episode one, with like Instagram or something like that If one piece of your app fails again, like thinking about Spotify or how we talked about it in episode one, with like Instagram or something like that. If one piece of your app fails, your entire app should not fail. If your DMs are down, you're being able to post a picture should still work. If your playlist function is down, I should still be able to search and listen to music. And that's what partition tolerance is. It says these things are tolerant from other partitions failing and again, in the age we are in, it's designing a system. It is a must, and so having to choose between consistency and availability is really the core of the CAP theorem when building a system.

Speaker 1:

The second category can actually be found in episode one, episode four, but really all over this podcast and its concept, and that is scalability. More specifically, you should define if there is anything special about your system that needs to be extra scalable at certain times or days, because every system should be able to scale, but this is more handling burst scaling, special scaling, if you will. For instance, are you going to get a surge of users during the holidays? How do you handle a burst of traffic at a certain part of the day? By this point, you should know how to handle scalability. Your system should be able to take that in consideration already, right, but, as mentioned, this specific category is for deciding how to handle bursts of traffic at specific times or days. For instance, if you are building a retail system, black Friday is a huge day. You get a very large burst of traffic, really no matter what kind of retailer you are. So being able to handle that and not go down on Black Friday is very important. This sort of category can be optional for systems that won't have this burst of traffic. So, in the event where you're not needed this burst of traffic scalability, I would implore you to just think about whether the system is read heavy or write heavy and plan accordingly for that.

Speaker 1:

Third, we'll talk about environment constraints, and this is something that doesn't always come up, but when it does come up, it's vital. It can make or break your system at its core, and the main concept of this is to identify if your system will be impaired by its environment in any way. For instance, think about Google Maps. Imagine if you were driving through a dead zone on a road trip, lose service, and then your map stopped working. This would be an awful user experience, right, and so that's what you sort of want to take into consideration. Another thing you might ask yourself is is the the hardware that's going to be running this? Uh, what if it has limited battery? What if it has limited memory? Is this hardware going to be running older hardware? Or maybe it'll be run in a country with limited bandwidth for internet? Be sure to call these things out as you think about them, because they can be the difference between an okay system and, quite literally, a game changer.

Speaker 1:

Next, we're going to talk about security. This is an easy one that you know people sort of brush over and don't really think about, but it's important to know how your data is handled and how it's being protected. You should always consider the data you're handling to be important and secure, no matter what it is, but actually sometimes there are laws put in place with how the data is stored and who can see it. Personal identifying information is one that you'll see throughout the US here If you go to Europe. They have specific laws for who can access data, both internally and externally, and it's quite literally important to remember these things because in the event where your data is not encrypted or someone has access, even within your company, to certain data. If you get caught, you can actually be fined. You can be punished by the government and local laws. So understanding security around your data is also very important. To learn more about privacy. I'm actually going to include a link down in the show notes for the fair information practices article. Cloudflare wrote up a great article about it and they talk more about this fair information practices sort of concepts that they implore on their end, and I think it's a good read and something good to take into consideration when security is important as part of your system. So, yeah, so obviously there's always going to be something secure in your system, but if you're handling something like data for someone's health, right, that's extra secure, extra important, and that's where this category really comes in.

Speaker 1:

At number five here we're actually going to talk about one again we've covered all over this podcast and that's latency. And the reason why we're going to talk about it again is because it's extremely important to the user experience of a system. Latency, which basically is just the lag between an action taken and the effect being applied, can cause not only a bad experience but, as we covered in episode one, it can literally cost your company millions of dollars. And to refresh on this in episode one we talked about how Amazon found that every 100 milliseconds of latency in their system, they lost around 1% in their sales A billion dollar company. So it goes without saying having a low latency system should always be something you take into consideration. In some cases, it's even more important. If something needs to happen in real time, then there are certain tools that you should implement. Things like kafka is one that's great for like real-time interaction, that sort of pub sub system, as we talked before. It can help with decreasing that latency between something happening and a user knowing that something happened Up.

Speaker 1:

Next, we're going to talk about durability. This is one of those categories that don't always apply, but it's important to remember and consider. Durability is as simple as how important is it that your data isn't lost, right? We talked about it a bit in episode two in relation to databases, but the same core concepts apply to building a system as well. If you're running something like Twitter and you lose your data, it's a headache for the customers, but it's, overall, not the end of the world. It's not a humongous concern, right? In fact, the once juggernaut social media website, myspace, which some of you might be too young to remember, but it was a giant social media platform. Just in 2019, it lost over 12 years of people's music photos, photos and music from a bad migration on a server, and while if this happened at something like bank of america or capital one, it would quite literally be the end of their business and possibly have, you know, economic implications, myspace is still running to this day. They have six million daily users from around the globe and, again, like if a bank lost 12 years of data, it would crater a financial sector, and so that's why things like durability are, you know good to to take into consideration.

Speaker 1:

Right, how important is, uh, you having replication in your system? How important is it that you know you have this hosted on multiple servers in different regions on the cloud, right? These sorts of things that have continuous backup? Last but not least, I want to circle back to partition tolerance. While we talked about how it is a necessity when building a scalable system, we should make sure to think about how to actually implement the partition tolerance. As we discovered in episode two, redundancy, failover and recovery mechanisms are all the backbone of partition tolerance. Having read, replicas and standby servers that are scaled horizontally can help protect you from taking your system down as a whole, but really, in the end, it depends on the system and the data. I want you to understand you know from the last seven episodes I wasn't just talking about these theoretical concepts I want you to understand how we apply them, where we apply them, and you'll sort of see how these words like scalability, partition, tolerance, latency they keep coming up because of how important they are.

Speaker 1:

In the end, you won't ever take all eight of these things into consideration in an interview. It would spend way too much time. Right? You should take a specific three that makes sense with the system you're building and the data being handled. If you're working on a system outside of an interview, then definitely take the time to address all eight of these, take them into heavy consideration and make sure that you're planning for all eight. If all eight of them apply, have an entire plan for why you are doing something and if you aren't doing it, have a good reason why you are. And while they may seem like just a bunch of jargon, both functional and non-functional requirements will be the foundation you keep coming back to when building a system. You can consider them the source of truth in an interview or when building a product. It answers those high-level questions from stakeholders like why are you doing this, why are you using this technology? And you can simply point to hey, bullet point two says that the user should be able to listen to a song and get a recommendation from that. That's why I'm using this Kafka instance to send a PubSub and send the event that consumes this song and creates recommendations based on this song in our recommendation engine.

Speaker 1:

Once again, I appreciate genuinely everyone who's reached out to me during this quiet period. While I dealt with everything. No one was rude or mean and, honestly, has been incredibly supportive, and so thank you so much. Special thank you to Charles Cazals from France I'm sorry if I'm pronouncing any of these names wrong and Ginny Luke on her daily commutes for their emails and kind words. Special thank you to Taurus on Discord for helping me get everything back together and keeping me in check. Huge shout out and thank you to everyone again from Patreon Charles Cazals, again thank you for both the email and the Patreon stuff. Eduardo Muth Martinez, wajo Soares, adrian Keurig, gabriel, just Gabriel, jesper Klotzel, nathan Sutton, florian Breider and again, Klavian S For joining the Patreon and supporting and, if you too, want to hear your name butchered by me?

Speaker 1:

Um, you know, feel free to join our patreon and anyone whose name I did butcher, please, uh, send me an audio message, email me, send it to me on discord. I promise I will work on getting better at these. Um, you're all amazing and wonderful, genuinely. If you've listened to this podcast and shared it with anyone, or even just enjoyed it, uh, thank you from the bottom of my heart, sincerely. If you would like to suggest new topics or even be a guest on the podcast, feel free to drop me an email learnsystemdesignpod at gmailcom. Remember to include your name if you'd like a shout out, if you would like to support the podcast, me pay my bills. Please jump over to Patreon. All podcasts are inspired by Crystal Rose, all music written and performed by the wonderful Aimless Orbiter. You can check out more of his music at soundcloudcom aimlessorbitermusic. And, with all that being said, this is Benny and I'm scaling down you.

People on this episode