Table of contents
- Want to build scalable apps and ace your system design interview? Master horizontal scaling and load balancing to handle more users and ensure fault tolerance.
- The internet is like a postal system for data, breaking it into packets, sending them, and reassembling them reliably using TCP, while DNS translates domain names to IP addresses for seamless browsing.
- Software engineering is all about finding new and complicated ways to store and move data around.
Want to build scalable apps and ace your system design interview? Master horizontal scaling and load balancing to handle more users and ensure fault tolerance.
Do you want to level up from a Junior Dev so that you can build scalable apps or maybe get a 100K pay bump by passing your system design interview round? Well, you're going to need NeetCode's 20 essential system design concepts, which include networking, API patterns, databases, and more, all of which are covered more extensively on neetcode.io.
To start, let's say you have a single server which is accepting and fulfilling requests for your application. But now we're getting more users, so we need to scale it. The easiest thing to do would be to add more resources like RAM or maybe upgrade your CPU. This is known as vertical scaling. It's pretty easy but very limited. A better approach would be to add replicas so that each server can handle a subset of requests. This is known as horizontal scaling. It's more powerful because we can almost scale infinitely and we don't even need good machines. It also adds redundancy and fault tolerance because if one of our servers goes down, all of the other servers can continue to fulfill requests. This eliminates our previous single point of failure. However, the downside is that this approach is much more complicated.
First of all, how do we ensure that one server won't get overloaded while the others sit idle? For that, we'll need a load balancer, which is just a server known as a reverse proxy. It directs incoming requests to the appropriate server. We can use an algorithm like round robin, which will balance traffic by cycling through our pool of servers, or we could go with another approach like hashing the incoming request ID. In our case, the goal is to even the amount of traffic each server is getting. If our servers are located all around the world, we could even use a load balancer to route a request to the nearest location.
This brings us to content delivery networks (CDNs). If you're just serving static files like images, videos, and sometimes even HTML, CSS, and JavaScript files, you can just use a CDN. It's a network of servers located all around the world. CDNs don't really run any application logic; they work by taking files hosted on your server, aka the origin server, and copying them onto the CDN servers. This can be done either on a push or pull basis. CDNs are just one technique for caching, which is all about creating copies of data so that it can be refetched faster in the future. Making network requests can be expensive, so our browsers will sometimes cache data onto our disk. But reading disk can be expensive, so our computers will copy it into memory. But reading memory can be expensive, so our operating systems will copy a subset of it into our L1, L2, or L3 CPU cache.
But how do computers communicate with each other? Every computer is assigned an IP address, which uniquely identifies a device on a network. To round it all out, we have the poorly named TCP/IP or the Internet Protocol Suite, since it actually includes UDP as well. Focusing on TCP for a second, there has to be some set of rules, aka protocols, that decide how we send data over the Internet. Just like how in real life we have a system that decides how we send mail to each other. Usually, when we send data like files, they're broken down into individual packets and sent over the Internet. When they arrive at the destination, the packets are numbered so that they can be reassembled in the same order. If some packets are missing, TCP ensures that they'll be resent. This is what makes it a reliable protocol and why many other protocols like HTTP and WebSockets are built on top of TCP.
But when you type neetcode.io into your search bar, how does your computer know which IP address it belongs to?
The internet is like a postal system for data, breaking it into packets, sending them, and reassembling them reliably using TCP, while DNS translates domain names to IP addresses for seamless browsing.
Just like in real life where we have a system that decides how we send mail to each other, the internet uses a similar system for sending data. When we send data like files, they are broken down into individual packets and sent over the internet. Upon arrival at the destination, the packets are numbered so that they can be reassembled in the same order. If some packets are missing, TCP ensures that they'll be resent, making it a reliable protocol. This reliability is why many other protocols like HTTP and WebSockets are built on top of TCP.
When you type neetcode.io into your search bar, your computer needs to know which IP address it belongs to. For this, we have the Domain Name System (DNS), a largely decentralized service that translates a domain to its IP address. When you buy a domain from a DNS registrar, you can create a DNS A record (which stands for address) and enter the IP address of your server. When you search, your computer makes a DNS query to get the IP address, using the A record mapping to retrieve the address. Your operating system will then cache it to avoid making a DNS query every single time.
We usually use HTTP to view websites because TCP is too low level, and we don't want to worry about individual data packets. HTTP is an application-level protocol that developers use on a day-to-day basis. It follows the client-server model, where a client initiates a request, which includes two parts: the request header (similar to a shipping label) and the request body (the package contents). To see this in action, you can open your Dev tools Network Tab and click subscribe. The response also includes a header and a body.
Even with HTTP, there are multiple API patterns we could follow. The most popular one is REST, which standardizes HTTP APIs, making them stateless and following consistent guidelines. For example, a successful request should include a 200 code in its response header, a bad request from the client would return a 400 code, and a server issue would result in a 500 level code. Another API pattern is GraphQL, introduced by Facebook in 2015. Instead of making another request for every single resource on your server like with REST, GraphQL allows you to make a single request (a query) and choose exactly which resources to fetch. This means you can fetch multiple resources with a single request without over-fetching any data that's not needed.
Another API pattern is gRPC, considered a framework released by Google in 2016 as an improvement over REST APIs. It's an RPC framework mainly used for server-to-server communication, but there's also gRPC-Web, which allows using gRPC from a browser. This has been growing quickly over the last few years. The performance boost comes from protocol buffers. Compared to JSON, which REST APIs use, protocol buffers serialize data into a binary format, making it more storage-efficient and faster to send over a network. However, JSON is more human-readable since it's plain text.
Another application layer protocol is WebSockets. To understand the main problem it solves, consider chat apps, for example. Usually, when someone sends you a message...
Software engineering is all about finding new and complicated ways to store and move data around.
Using gRPC from a browser has been growing quickly over the last few years. The performance boost comes from protocol buffers. Comparing them to JSON, which is what REST APIs use, protocol buffers are an improvement in that data is serialized into a binary format, which is usually more storage efficient. Of course, sending less data over a network will usually be faster. The downside is that JSON is a lot more human-readable since it's just plain text.
Another app layer protocol is WebSockets. To understand the main problem that it solves, let's take chat apps, for example. Usually, when someone sends you a message, you receive it immediately. If we were to implement this with HTTP, we would have to use polling, where we would periodically make a request to check if there was a new message available for us. But unlike HTTP 1, WebSockets support bi-directional communication. So when you get a new message, it's immediately pushed to your device, and whenever you send a message, it's immediately pushed to the receiver's device.
For actually storing data, we have SQL or relational database management systems like MySQL and Postgres. But why should we use a database when we can just store everything in text files stored on disk? Well, with databases, we can more efficiently store data using data structures like B-trees, and we have fast retrieval of data using SQL queries since data is organized into rows and tables. Relational database management systems are usually ACID compliant, which means that they offer durability because data is stored on disk, so even if a machine restarts, the data will still be there. Isolation means that different concurrent transactions won't interfere with each other. Atomicity means that every transaction is all or nothing. Lastly, we have consistency, which means that foreign key and other constraints will always be enforced.
Consistency is really important because it led to the creation of NoSQL or non-relational databases. Consistency makes databases harder to scale, so NoSQL databases drop this constraint and the whole idea of relations altogether. There are many types of NoSQL databases, but popular ones include key-value stores, document stores, and graph databases.
Going back to consistency for a second, if we don't have to enforce any foreign key constraints, that means we can break up our database and scale it horizontally with different machines. This technique is called sharding. But how do we decide which portion of the data to put on which machine? Well, for that, we usually have a shard key. Given a table of people, our shard key could be the ID of the person. Sharding can get complicated, so a simpler approach is replication. If we want to scale our database reads, we can make read-only copies of our database. This is called leader-follower replication, where every write will get sent to the leader, who sends those to the followers, but every read could go to a leader or a follower. There's also leader-leader replication, where every replica can be used to read or write, but this can result in inconsistent data. So it would be best to use it where you can have a replica for every region in the world, for example. It can be complex to keep replicas in sync, so the CAP theorem was created to weigh trade-offs with replicated design. It states that given a network partition in a database, as database designers, we can only choose to favor either data consistency or data availability. What makes it confusing is that consistency here means something different than with ACID. It's a somewhat controversial theorem, which is why the more complete PACELC theorem was created.
Lastly, we have message queues. They're kind of like databases because they have durable storage, and they can be replicated for redundancy or sharded for scalability, but they have many use cases. If our system was receiving more data than it can process, it would be good to introduce a message queue so that our data can be persisted before we're able to process it. In doing so, we also get the added benefit that different parts of our app can become decoupled. If you've learned anything by now, it's that software engineering is all about finding new and complicated ways to store and move data around.
If you want to learn more, you can check out my System Design for Beginners course on neetcode.io as well as my System Design Interview course. Thank you for supporting my work, and I'll see you soon.