Languages Should Offer Servers as a Fundamental Abstraction
I was interviewing an engineering manager and he was talking about how his team split an unwieldy Java monolith with too much functionality tightly coupled into microservices for loose coupling. This made me think whether microservices are the only way to reduce coupling.
If you are a programming language designer, what can you do to reduce coupling?
In fact, design of languages, and of software built in them, is all about imposing restrictions to constrain systems to work in well-defined rather than ad-hoc ways. For example, functional programming languages like Haskell restrict modifications of objects to prevent certain bugs. As a second example, languages like Elixir that handle each HTTP request in a separate process prevent bugs in a process, like a crash or consuming too much CPU, from affecting others. As a third example, encapsulation prevents users of a class from modifying its internals. All programming language innovation, and coding guidelines, are about restrictions 1.
Restrictions free us. My apartment building restricts me from putting my things in common areas, but that frees us from having to deal with other people’s junk.
With this in mind, if you’re a programming language designer, how can you impose restrictions to break tight coupling?
The key insight is that loose coupling requires you to break up your program into components that communicate in restricted ways. It does not require those components to be on different machines communicating via a network.
To implement this insight, languages can support a “server” abstraction. You declare a server like you’d declare a class. Here’s one that renders documents:
server DocumentRenderer {
// Returns a document ID that you use in subsequent requests.
string upload_document(byte[] Document)
// Renders the given page as a PNG.
byte[] render_page(string document_id, int page_number)
}
You upload a document to the server, and then ask it to render a specific page.
The key difference between a class and a server is that when you invoke a function on a server, you can’t pass references to objects. Nor can the server return references. Only bytes can be passed, similar to sending and receiving bytes across a TCP connection. You can also pass anything that’s serialisable.
The entire program consists of a bunch of servers. Code that’s not explicitly in a server is in a default “main” server.
When you invoke a server, you can specify a deadline. If the server doesn’t return by then, the framework throws a DeadlineExceededException for you.
You can invoke a server using a synchronous or asynchronous API. Unlike traditional languages where an API can’t be invoked in both a synchronous and an asynchronous style unless it offers both these implementations, in our model every server can be invoked both synchronously and asynchronously. The implementor can offer both implementations, or only one, whichever he prefers to implement, and the framework will bridge to the other. For example, if the implementation is synchronous and it’s invoked asynchronously, a worker thread will be brought in. Conversely, if the implementation is asynchronous and it’s invoked synchronously, the framework will add a wait() call on the returned Promise to make it synchronous. Taking a step back, you’ll be able to invoke every server both synchronously and asynchronously.
In the latter case, you get back a Promise on which you can do other things like add listeners. Or check whether it has finished. Or chain promises together using all or any to create a compound promise.
What are the benefits of this model?
First, imagine a function f() invokes g() and passes an object as an argument, by reference. It’s hard for f() to know whether g() will modify the argument. Unintended modifications cause bugs. You have no way of verifying this. Even if you audit every line of code in g(), g() can invoke another function h() which modifies the object. You can’t trace through thousands of functions to verify that the object is immutable. Even if you could, someone could change any of the thousands of functions in the call chain to mutate the object, breaking your design. You can’t build a stable, reliable system on such foundations. Servers enforce pass by value, preventing this scenario.
Second, in addition to f() breaking because g() modified the object, g() might also break because f() modified the object while g() was executing. Either in a callback or in a separate thread. Servers snapshot their arguments before execution, isolating f() from g() and g() from f().
Third, you can also control who can access an object. The answer is: only code within the same server. Other servers can access only data they’ve been directly given, not indirectly by following references from other objects. This makes it easy to reason about who is supposed to access which data, preventing a design where every object has access to every other object. You can’t enforce any kind of layering in such a system.
Fourth, you don’t need to worry about thread safety. You can invoke a server from any thread, which you can’t with a class. The server is isolated from any threading decisions on the client. The client is also isolated from any threading decisions on the server, which is free to use whatever threading it wants — a single thread that invokes synchronous APIs, a single thread that invokes asynchronous APIs, or multiple threads. All you need to know is that the server will eventually respond to your request.
Fifth, you can also sandbox the server, limiting its memory use. You can’t do this in a tightly coupled system, since everyone has a reference to everyone else, so you can’t demarcate individual pieces of code to sandbox.
Sixth, organising your program as servers within a process makes it easier to distribute them in the future across machines, if you need to, because your code is already structured that way.
Why not microservices?
If we’re going to have in-process servers, why not go all the way to microservices?
First, sizing: whenever you introduce a microservice, you need to decide how many instances to run. If you have a microservice A, and you extract out some of its functionality into a separate microservice B, how many Bs should you have? Only one? As many as A? Something in between?
Second, service lookup: in the above example, A needs to have some way to look up the instance of B to connect to. This can be done using DNS. At Google, we had a variation called BNS. And there are other ways. Whichever you choose, it’s cognitive load.
Third, routing: You need to have some routing algorithm to decide which instance of B a given instance of A talks to. If you get it wrong, you could have too many connections, which grow in an MxN manner, wasting significant resources on both ends, which is a problem Google faced.
Fourth, load balancing: Do you have some kind of a load balancer in between A and B, which tells A which instance of B to talk to, rather than hardcoding that knowledge into A?
Fifth, error handling: When you have microservices, A needs to worry about what to do if it can’t reach B.
Sixth, retry: Errors can be retried, but you should be careful not to do so at multiple layers of the stack. If you try thrice at four layers of the stack, that’s 81 retries, enough to bring down your lowest layer. Again, this is not academic — it happened at Google.
Seventh, in-process servers can be implemented with far less overhead than a separate microservice running on a VM on another machine, with a separate OS and network in between. A process in Elixir takes only 9 KB! A v8 isolate starts in only 5 ms! This saves cost, and allows new models of execution like servers to be started on demand and shut down when no longer needed, just as objects are garbage-collected.
In summary, programming languages should offer servers as a first-class abstraction, to enable loose coupling without the complexity of microservices.
Otherwise, you could use any language, since they’re all Turing-complete and anything that’s possible to build in one language is possible to build in any other.