FaaS Is A Promising Architecture For Backends

Say you’re starting a startup today. How should you build your backend?

You should start by considering the highest level of abstraction, which is backend as a service. Which means Firebase. Say, after considering it, you find that it’s too high a level of abstraction for you, and you need to go one level lower.

That’s Functions as a Service. Let’s examine the different aspects of this architecture:

⦿ FaaS makes sense when you have a wide and shallow architecture, as opposed to a narrow and deep architecture. That is, you have a bunch of HTTP endpoints with relatively self-contained logic. For example, a startup I worked with needed to email their users when they delete the app. This was done via a Function, which didn’t interact at all with the rest of the logic of the app. Many backends are like this, with a bunch of HTTP endpoints whose logic is closer to unrelated than to related to each other. If your backend is like this, FaaS is a good fit. Or, put differently, if your product is a CRUD app. In fact, I heard on a podcast that FaaS is the right architecture for such backends.

Conversely, if your product is like Google search, with only a small interface, only one HTTP endpoint (/search) and a huge amount of backend complexity, with crawlers, ranking, abuse, connections to dozens of external data sources from weather to stock markets to everything else you can think of, then FaaS may not be the right architecture for your backend.

⦿ FaaS abstracts away some things to watch out for with other architectures. For example, Functions don’t receive concurrent requests, so you don’t need to synchronise access from multiple requests running at the same time.

A long-running VM might have an in-memory cache, which opens up the possibility of the caches on multiple servers losing synchronisation. For example, imagine that a server receives a request to edit a user’s profile. The server updates the database and its in-memory cache. After editing, the user makes a request to read his profile, but the request happens to be routed to another server, whose cache is stale. The stale profile is shown to the user, making it seem like the edit didn’t work. The user will feel your system is broken, reducing his confidence in it. This problem of a stale cache is unlikely to happen with Functions, since they’re designed to be spun up on demand, execute, and terminate, so you don’t keep caches in the first place.

⦿ Functions autoscale with traffic. A VM doesn’t — you need to set up and configure autoscaling, and make sure that running multiple instaces of your backend doesn’t cause bugs (see the previous point). Functions autoscale out of the box, letting you focus on your business logic, as you should.

⦿ Provisioning a Function is easier than provisioning a VM. This is because VMs typically require more resources (CPU and memory) to process more requests, so it’s hard to give one answer to the question of how much are needed. It depends on how many requests it’s receiving. It also depends on the mix of requests, since some requests may be more demanding than others. On the other hand, an individual Function handles only one request at once, and only one type of request, so its resource requirements are easier to understand and model.

⦿ FaaS has a better worst-case behavior under load. An overloaded VM slows down, increasing latency, causing timeouts. Even if an error doesn’t occur, your users will lose patience if you take 40 seconds. Functions don’t slow down under load — they just spin up more instances, while maintaining the same latency. This is exactly the kind of behavior you want.

If the load gets even higher, a VM will crash, thus serving 0 requests successfully.

On the other hand, say you have a Function provisioned for 1000 simultaneous calls, but you encounter a load of 10,000. Then 9,000 requests will error out, but you’ll still successfully serve the 1000 you’re provisioned for. Which is better than the 0 a VM would serve:

⦿ Functions, being lightweight and ephemeral, can be distributed closer to users, for lower latency, like a CDN, but for code rather than static assets. Rather than choosing one datacenter, resulting in high latency for most of your users, you want low latency for all your users, because every second of extra latency amounts to losing some users. Your code should execute closer to users. In addition to the user-visible benefit of latency, there’s also the benefit of reduced administration overhead: you shouldn’t have to choose which datacenter your code runs in, any more than you have to choose which power station to draw power from when you turn on your AC.

⦿ FaaS also scales to a more complex codebase than a monolith would, because an FaaS codebase is already split into independent Functions. You don’t need to understand the overall code to work on one function. FaaS can use libraries to extract out reused code, and import each library into whichever function needs it. Such a pattern is much more understandable than if all the code is linked together into one monolith.

⦿ FaaS is a better programming model than microservices, because a single HTTP request doesn’t need to bounce between multiple services, introducing multiple points of failure, latency, request routing, load balancing, and retries. Each of these can cause more complexity. For example, retries sound safe, but if you retry thrice at three layers of your system, you’re retrying 27 times! This can be enough to bring down the lowest layer of your system, like the database. All this fractal complexity is avoided in FaaS because each request essentially executes like a monolith. Different requests execute on different machines, but since each request executes on one machine, the above complexity is avoided:

⦿ Different Functions can be coded in different languages. For example, if you’ve built your backend in JS, and then want to adopt machine learning for some new functionality, you may find it convenient to use Python. At that point, you can do so, changing only those endpoints that need ML, rather than all. You could argue that you should have chose Python to begin with, but you can’t foresee the future, so a flexible abstraction like Functions helps.

If you want to explore a new language, you can try it out with only one Function, knowing that if it fails, you can easily rewrite it in your older language. By contrast, trying out a new language in a traditional backend is riskier — you need to rewrite your entire backend, and you’re screwed if you chose wrong. As you invariably will, from time to time.

As a third example, imagine you built your Functions in Go, after which your frontend engineer needs to work on the backend to implement some more endpoints. He doesn’t know Go, and wonders if learning a new language is the best use of his time, as opposed to using the language he knows — Javascript. FaaS gives you more flexibility in that decision.

⦿ Unfortunately there’s a fly in the ointment, and that’s cold start. To combat cold start, I wish FaaS platforms like AWS Lambda or Google Cloud Functions let us configure unused functions to hang around for an hour rather than 10 minutes [1].

Once the cold start problem is solved, I think FaaS has a huge potential as the architecture of choice for most backends that can’t use BaaS, streamlining backend engineering.


[1] An hour is long enough to virtually eliminate cold starts, while still benefitting from lower costs due to lower usage at night.