We Should Have Hypervisor-Based Desktop and Mobile OSs
Cyberattacks cost in excess of a trillion dollars a year, more than the GDP of most countries in the world. Even companies as mighty as Google have been successfully attacked. In this context, we need to think about how to improve security. Specifically, in this blog post, I’ll focus on desktop (and mobile) OS security, not server security, or security of other systems like browsers.
Operating systems sandbox code from different vendors so that a1 security problem in one doesn’t affect others:
But there’s a better way to sandbox: hypervisors. They’re more secure than OSs. Cloud providers don’t trust the OS to separate different customers. In fact, hypervisor-based security is the foundation of the cloud. If hypervisor security breaks down, the cloud breaks down. Hypervisors are that important.
Why are hypervisors more secure than OSs? Because Ken and KVM contain around 2 million lines of code, while the Linux kernel contains 30. That’s 15 times as many opportunities for security vulnerabilities to exist.
Let’s bring this architecture to the desktop:
The lowermost layer is the hypervisor. The hypervisor runs a bunch of VMs. The first one is the System VM, which runs only Microsoft2 code. No third-party code. Apps run in VMs, thus letting the hypervisor protect the System VM from apps. This is a stronger level of security than relying on the OS to protect itself from apps.
Interfaces will be exposed for hardware, say for an app to use the microphone. For privacy, say for an app to request location. For system interactions, like one app providing a synced folder like Google Drive to other apps. For UX — when you rotate an iPhone to landscape, the app content animates to the new orientation along with the status bar at the top, which naturally requires coordination between the OS and the app. To conserve battery, like a hardware-accelerated video encoder.
Every API that exists in the OS should have a justifiable reason why it needs to exist, why the same thing can’t be done just as well in application code. Unless removing it hurts functionality, performance, battery usage, UX or privacy, it should be removed.
But other than these, there will be minimal interfaces to the OS. Unlike legacy OSs, which provide thousands upon thousands of APIs. A big API surface area is insecure. A security bug can exist in any of them. If any of those APIs is found to be insecure, it can’t be removed or fixed in a backward-incompatible manner, because that will break apps. In addition to security, a big API surface makes backward-compatibility harder. For all these reasons, we want as few APIs as possible between the app VMs and the system VM.
The OS won’t provide convenience functions or app frameworks. For example, iOS provides UIViewControllers, each of which represents one screen of your app. The home screen of BigBasket is represented by one:
If you tap a product, you’ll get a second screen, represented by a different UIViewController:
But this is an app-level concern, not an OS-level concern. It’s a violation of separation of concerns, also called modularity. The OS should not provide these abstractions any more than a database should provide a music player. When a component does too many things, the potential for security vulnerabilities increases. The OS becomes bloated. Updating the OS becomes hard. When an app tries to use APIs introduced in a new version of the OS, it can’t run on older versions. For example, Swift UI does not support older versions of macOS and iOS. This makes no sense. A UI framework should have no logical connection with the OS, since it’s purely an app-level concern. It’s not like Continuity Camera, which allows a Mac to use an iPhone as a webcam — this requires OS-level integration. When you violate separation of concerns, all these problems arise. That’s why the OS should provide a minimal set of APIs.
In fact, the OS APIs that communicate from the app VM to the system VM will pass data back and forth in a structured format like JSON3, so they will be language-agnostic. Unlike legacy OSs that have a preferred language. If you're writing a macOS app, you'll use Swift, since the macOS APIs are in Swift. If not, you'll have to have a translation layer to Swift, which is a huge amount of effort, may not expose all functionality, introduces impedance mismatch, hurt performance, and is a maintenance burden. This is why most Mac apps are written in Swift or Objective-C. With our hypervisor-based OS, since the interaction will be via JSON, each side can use whichever implementation language it wants, as with a network server. The developer of each server chooses whichever implementation language works best depending on their use case, like C++ for MySQL, Rails for a CRUD backend, and Go for high-performance servers. Similarly, when apps run in VMs, each app developer can choose whichever language works best for their use case.
App VMs, being VMs, need guest OSs. Each app developer can choose whichever OS works best for their use case. For example, if it’s an Electron app, maybe they can use Chromium OS instead of Windows. The former is lightweight4. Why would you want the overhead of the Windows API if you're not going to invoke it 5?
As we consider this VM-based future, we also need to think about backward compatibility. What happens to the tons of legacy apps, which aren’t built for this future? The answer is a legacy VM, which is the rightmost box:
The legacy VM will run all of today’s apps. This will basically be a separate installation of Windows6. They won’t benefit from the strong security of the hypervisor, but they won’t be able to hurt the OS itself (running in the System VM) or apps running in their own VM. Legacy apps will be able to hurt only other legacy apps.
In fact, not all apps will be suitable for running in a VM. For example, as I write this, my Mac is running 400 processes that each take < 10 MB memory. The overhead of a VM will be prohibitive for these. The developers of these apps can decide to run in the legacy VM.
In summary, desktop and mobile OSs should run on top of a hypervisor for increased security.
Accidental or intentional, like malware.
Or Apple, if we’re talking about macOS. Or Google, if we’re talking about Android.
Or a more efficient version like binary JSON or Google’s protobufs.
A Chromebook with 4GB memory has similar performance as a Windows laptop with 8GB memory.
Over time, frameworks like Electron will evolve to supply a VM along with a browser, to make developers’ life convenient.
But with plumbing to the system OS, so that (for example) you have one notification panel instead of two.