Frontend and Backends Timeouts

Timeouts are essential for managing resources and preventing overload; knowing when to stop waiting can save your system from chaos.

I want to talk about timeouts in the backend. You see, the concept of a timeout is when a party is waiting on some sort of a resource or action to be completed. It can place a certain amount of time out, and when that time elapses, it will terminate the waiting action. Now, this does not mean that by terminating the wait, I am necessarily canceling the action, because those are two orthogonal things. You can basically give up waiting while the processing is still continuing to happen, or you can also submit another request to actually cancel that processing action that is ongoing.

In this video, I want to discuss five different, popular types of timeouts. Let's jump into it. There are many types of timeouts, and specifically, as you go up or down in the OSI layers, you will be confronted by different types of timeouts. The reason for putting timeouts in place is to free up resources, either to free up resources like memory or CPU, or to avoid denial of service attacks. Timeouts are critical for that because you can stop waiting, and if there is an attack trying to drain all your resources, you can quit a little bit earlier so you don’t just hog on resources processing or waiting, for that matter.

Timeouts can also be used to detect slow-running operations. For example, if you have certain service level agreements (SLAs) in place that specify a particular operational REST endpoint or GraphQL call should only take a maximum of three seconds, you can perhaps put a timeout at five seconds. This way, you can avoid running away with these processes and capture long-running processes for diagnostic purposes.

Now, let's talk about the five different timeouts. The first one is connection timeout. When a client wants to connect to a server, it needs to establish a connection, which is often done using a protocol called TCP (Transmission Control Protocol). When we colloquially say "connection," we mean both this TCP connection and often the TLS session that is set up on top of this TCP connection. You can have multiple TLS sessions that go back and forth on the same TCP connection for security reasons, but essentially, we are talking about this one logical connection that includes encryption.

Often, people break down the timings for these connections. When I am attempting to establish a connection and, for some reason, this connection is taking a long time—i.e., the three-way handshake is slow, or the server is having a hard time responding to your SSL or TLS session as part of this connection establishment—we put a connection timeout in place. This is often a client-side timeout that indicates, "Well, I tried to connect to this backend server, but I couldn't connect within this particular time, so I will stop trying or maybe I will retry later."

It's critical to understand the terms client and backend here because, as you will see by the end of the video, the client does not always mean a browser. The client could also be a server or another backend, especially in a proxy configuration where you have a client talking to a reverse proxy and the reverse proxy talking to an origin, such as a CDN (Content Delivery Network). So, if you talk to this backend, the CDN or reverse proxy is now a client. It is very important to understand these timeouts because everyone is a frontend and everyone is a backend at the same time. There is always a backend API being called somehow at the end of the day.

So, we establish a connection, and you can put a timeout on this connection timeout, which is very critical. You don’t want to keep waiting for connection establishment. Then, there is the request read timeout, or just read timeout. This is a backend-specific timeout. Now, the client has a full-fledged connection that it can write the request on, and we are talking about requests here—like a GET request, a POST request, HTTP, or even any other protocol. It doesn't have to...

=> 00:06:08

Understanding the importance of timeouts in backend processing is crucial; they protect your system from slow requests and potential attacks while ensuring efficient communication between client and server.

Understanding timeouts is crucial when dealing with client-server interactions, as everyone operates as both a front end and a back end simultaneously. At the end of the day, there is always a backend API that you are connecting to.

To begin, we establish a connection, and it is essential to set a timeout on this initial connection. This is critical because you do not want to keep waiting indefinitely for the connection to be established. Following this, there is the request read timeout, which is specific to the backend. At this point, the client has a fully established connection and can write the request. We are discussing various types of requests here, such as a GET request, a POST request, or even requests using protocols other than HTTP.

The concept of a request is well-defined within the protocol; for example, an HTTP request starts with a method (like GET), followed by a path, and ends with double new lines. Although there are variations in HTTP/2 and HTTP/3, the logical separation of a request remains critical to understand.

When the backend receives the request, it begins to read it. This process involves receiving a stream of bytes over the connection, which are often encrypted. The backend decrypts these bytes to obtain plain text and looks for the start of the request. As it reads, it may encounter the beginning of another request, which it will set aside to focus on the first request. This reading process can sometimes be slow, especially if the client is transmitting a large file or a significant body as part of a POST request.

To mitigate potential issues, it is important to implement a timeout for how long the backend will wait to read the request. Certain attacks, such as the slow roll attack, exploit this by deliberately slowing down the request transmission to hog backend resources. In such cases, the backend continues to read and process the request, which can lead to resource exhaustion. Therefore, it is advisable to set a read timeout, such as five seconds, or even less, depending on the typical size of requests. For instance, if most requests are small, around 1 kilobyte, a timeout of one or two seconds is reasonable.

Once the backend has received the request, it enters the processing phase. There are various patterns for executing requests. One approach is to execute the request synchronously immediately upon receipt. This means that the process or thread reading the request starts executing it right away. While this is a common behavior, it is not the only one. Another approach is to place the request in a queue for better management. This allows for more efficient processing, as requests can be handled in an organized manner, potentially using multiple queues or pools for different types of tasks.

In summary, understanding and implementing appropriate timeouts and execution patterns are essential for effective backend processing and resource management.

=> 00:10:49

Managing requests efficiently is key to a smooth backend process; prioritize, queue, and control execution to avoid overload and ensure timely responses.

In backend processing, we often encounter various ways to handle requests. There are a lot of patterns to execute requests. One common approach is to execute the request synchronously, meaning that the moment a request is received, it is immediately executed by the process or thread that is reading it. You might wonder, isn't that the expected behavior? Yes, that is one behavior, but there are alternatives.

Another method is to put the request on a queue for better management. This allows for a more organized processing system. For example, you can place the request in a queue, and then another pool of processors can pick it up from the queue and start executing it. While this approach does add a bit of overhead, it makes execution more manageable and prevents it from spiraling out of control. By reading the queue, we can measure all the requests and see how long each request has waited. This leads us to the concept of wait timeout, which refers to how long a request needs to wait for a processor (not the CPU, but a processor thread or process) to pick it up and start executing it.

The wait time could be 1 second, 10 seconds, or even 1 minute. As a result, this method can introduce a bit more latency for the client, especially for long-running requests. For such requests, it is generally better to place them in a queue. In contrast, for short-running requests, a synchronous approach may be preferable, where the response is immediate upon receiving the request.

Additionally, one can implement heuristics or logic to prioritize certain requests over others. This means that some requests can be deemed more important, allowing for algorithms to kick in and manage the prioritization effectively. However, it is essential to inform the client if a request has timed out due to waiting too long.

The next timeout to consider is the usage timeout, also known as the processing timeout. This is typically what people think of when they hear the term "timeout." It refers to how long a request can hog processing power on the backend before it is terminated altogether. For instance, if a request is calculating prime numbers up to a billion or performing a large read from the database, it may take an excessive amount of time.

The appropriate usage timeout often depends on the type of request. If a request is expected to take a long time, different usage timeouts can be set based on the request type. This is where sophisticated backend applications can be developed to support such functionalities. It is crucial to inform the client if their request has been terminated due to exceeding the usage timeout.

Currently, there is no official timeout for processing usage in HTTP. The closest is the 408 Request Timeout, which indicates that the server took too long to read the request, placing the blame on the client. However, there is no specific timeout for actual usage that has exceeded the limit.

Finally, we have another client timeout known as the response timeout. This occurs when a request is sent, and the client is waiting for a response.

=> 00:15:32

Don't assume closing a connection means all requests are canceled; you need a solid logic in place to manage that cleanup.

There is no processing timeout usage that has exceeded. There is a 4xx error code, specifically a request timeout, which is basically what the read timeout is. We took too long to read, and as a result, it's your fault. People often say that a 4xx error code indicates a client failure, and that's what's happened. A read timeout occurs when, essentially, "hey, I'm trying to read, but I couldn't read your request." This is where people encounter a 403 error. However, as far as I know, there is no official timeout for an actual usage timeout that has exceeded.

The final timeout to consider is another client timeout, known as the response timeout. This occurs when I send you a request and I'm waiting for a response. The question then becomes: how long should I wait before I give up? Should I wait 1 minute, 2 minutes, or what is this time? This is typically applicable for synchronous requests. We have to be careful about this timeout because, with server-sent events, the browser does not have this timeout. It does not imply any response timeout on the client side. You can imply an abort if you want, and on the client side, you can use an abort signal and fetch, for example, to abort the request.

This topic can be quite controversial. When you abort a request, it does not mean that the backend has stopped processing it. You have to implement the necessary infrastructure to support that. Just because you aborted the request does not mean the server has no idea that you aborted it; it simply means you gave up waiting. So, the server continues to process the request, which could have either good or bad consequences.

To logically address this, you really need to send another request to cancel the first one. Some people might suggest closing the connection as a way to stop everything from running. However, I don't have any association with the connection, so the server will receive a notification, a FIN in TCP, indicating that I have given up and closed the connection. But does that mean that all requests associated with that connection should also be closed? Maybe yes, but it depends on the kind of system you have. In most cases, if you close the connection, you should terminate all existing running requests.

The issue arises because people often assume that this will happen automatically. Just because you close the connection, or even close the browser—which technically closes the audio connection—does not mean that the server will automatically cancel all ongoing requests. When you close the connection, the server will receive a FIN notification, but it is the kernel that closes the connection. The kernel must notify the application, which should have logic in place to recognize that the connection it was using has been closed.

This means that the application must associate every connection with a list of requests. This list should always be event-driven, meaning that if the connection closes, it should trigger the cancellation of the associated requests. But what does it mean to cancel a request? It could be sending a database query or waiting on something. Therefore, you need to implement logic in your processing loop that says, "Oh, by the way, never mind, just cancel."

An easier, albeit a little heavy-handed, way to handle this is to design your processing pool such that one thread executes one request at a time. In this way, you can simply kill the thread when necessary.

=> 00:20:03

Simplicity in design often leads to better performance; overcomplicating your logic can create more problems than it solves.

In a logic to clean up, what does that mean? Well, it means that I have to associate every connection with all the list of requests. This list should always be event-driven. That means if this connection is closed, then go and cancel the request.

What does it mean to cancel the request? It involves sending a database request query and waiting on something; it's in a loop doing things. Therefore, you have to put logic in place in your loop and processing that says, "Oh, by the way, never mind, just cancel." An easier way, although a little bit heavy-handed, is to design your processing pool thread such that one thread executes one request all the time. Then, you can kill that thread.

For example, if a connection from a client has just died, any request that is being processed for this client should just be killed. But where is it? It could be in thread one, thread two, or thread three. You see, this bookkeeping has to be put in place on the backend to show that, "Oh, this request is being executed by this client, and this client is gone." Therefore, there is no point in executing these requests because there is no client to send the response to.

So, how do you do that? Some people might say, "Oh, I'm just going to kill the thread." While that works, it is heavy-handed because killing the thread is too much. Now, you have to spin up a new thread, and that takes time. Even spinning up a new process is very expensive. Consequently, existing requests will have to wait, thus increasing the wait time for a worker to be executed.

It’s not straightforward at all; it’s beautiful how all these decisions can be made. You can go with the simplest possible decisions, and absolutely, you will be right. It's all about simplicity at the end of the day. However, keeping all of this in mind is a good practice. You can say, "Okay, this is what I know; I know about these things. It's okay; I don't think it's a problem for my use case," and then you move on. You don’t have to code for all these cases.

The final thing I want to talk about is what happens if you have a proxy, a CDN, or reverse proxies in the middle. In this scenario, all these timeouts also take place in the same way. For instance, if you want to send a request to a CDN, which is a reverse proxy, you send a request to the CDN and talk to it directly. The CDN will have a read timeout; it will read your request and will also have a connection timeout with the CDN.

The origin backend is happy; you’re not touching it. Then, the CDN might have a wait time, or maybe not. It will have a processing time, and here, the CDN will turn around and act like a client to the actual backend. Now, what kicks in? The connection timeout kicks in because you are trying to connect to the backend. This is where you start to send a request and try to process it. If you don’t get a response, you might encounter a Gateway timeout.

A Gateway timeout means, "Hey, I tried to connect to the backend, but either the connection has failed, or I sent a request successfully, connected to this guy, sent a request, and my request was received successfully, but I never got a response for, I know, five minutes." This is a response timeout from the origin to the proxy that has failed.

Now, the proxy has a choice: it can either retry or let the client know immediately. There’s no wrong or right here; you can do anything you want. Some CDNs will say, "You know what? Let's just put a retry in place." That’s another configuration: how many times do you want to retry? Three, four, five, or just one? Then, it can send it back, come back, and say, "Oh, you know what? I can't reach that guy," or "Actually, this time I got back a response."

The worst thing you can do is retry a request. That’s very dangerous, and we’ve seen it with the Amazon outage in 2020 during the pandemic. All their microservices that talk to each other were affected when one service got overloaded. The client timed out because it couldn’t respond in a certain time, so the clients were configured to retry. This means that the first request was not canceled, and it continued processing. Now, we have another request processing the new retried request, which incorrectly thinks that it failed.

Imagine doing a fleet of requests all retrying the same thing. Eventually, the moment the server comes back up, it just collapses back down. This is known as the Thundering Herd problem, which is similar to this description.

All of these timeouts apply to proxies, and you will encounter an extra Gateway timeout, which is one of these timeouts. Of course, there are many other timeouts you can configure based on your use cases. Essentially, the goal of timeouts is to free the resources earlier so you can do something more meaningful. You can also detect denial of service attacks or identify areas in your app where someone might exhaust your processing power.

That’s what I wanted to talk about today. See you in the next one. Goodbye!