How to Build Responsive Touch Interfaces by Terry Crowley
This is a summary of Rethinking MVC in Excel, Word and PowerPoint after 25 Years by Terry Crowley, who led Office for a decade.
Touch interface require extremely high responsiveness. For example, if you have Word document:
And you’re trying to drag the lizard around, it needs to feel as if it’s sticking to your finger. This means that the app has only 16ms to receive your touch events, figure out whether you’re trying to drag the lizard or the text, move the lizard, and repaint the screen. If it can’t do that consistently, the illusion will break, and the app will be perceived as an unresponsive, lower-quality app.
So, how do we do ensure responsiveness?
• Disk: Don’t block on the network or the disk in the critical path. People generally think the network is slow and the disk is fast. The disk is fast most of the time. But not always — it’s not consistent enough to rely on for a responsive interface. You should invoke the disk asynchronously and process the data when it arrives, while processing user input and continuing animations till then. You can rely only on building blocks that are consistently fast in the critical path.
• Consistent state: When an app invokes a server, its local state should still be consistent and the user should be able to interact with it. In Google Maps, when you pan the map, it used to show a blank area that would asynchronously get filled in from the server. If you zoomed, it would optically zoom (by zooming the map image) and in the background load data from the server. But missing data was modeled as a valid application state that you could interact with and further modify, like zoom in to the blank area. Unlike pre-Google Maps maps apps like Microsoft’s which ran off a CD and, when you panned, it would read map data from the CD and freeze till the data was available. The app was not in a consistent state and could not allow further changes. Whereas Google Maps treated the state of some data not being available as a first-class state, and decoupled that from the UI: the UI was always responsive regardless of whether 100%, 50% or 0% of the map was loaded. By modeling the missing data as a first-class state, it was more responsive running over the Internet than Microsoft’s predecessor was running from a CD!
• Views are more expensive than models: It’s far more expensive to create a view object for an already existing model object than it is to create a model object and load the data from the disk or network. This is why UI frameworks create views only for objects actually visible on screen. For example, on iOS, if you have a list with 1000 items (say your contacts), and only 10 are visible on screen, the number of views created will be closer to 10 than to 1000. This is all the more important since the number of model objects can be unbounded, while the number of views are limited to what’s visible on screen at once.
• 1:1 mapping: In the MVC paradigm, have each view require data from only one model object. This eliminates another layer of mapping, like a view model. For example, in the Windows mail app, each mail thread is an element in the inbox. Tune the client-server protocol to make at most one network call for each view item. Ideally, batch them. If a mail thread has four mails (one mail and three replies), you don’t want to make four calls to the server, since that’s just unnecessary latency. It also amounts to implementing functionality on the client that should be on the server. Tune the cache so that you’re doing at most one I/O to render one view.
• Animations: The human visual system is exceedingly sensitive to animations. Once you start an animation, you should keep going at 60 FPS no matter what . Core Animation on iOS animates in a render server to guarantee 60 FPS even if your main thread is blocked.
• Algorithms proportional to edit size: When users use an app, they create a mental model, which is a simplified, imaginary version of how it works. Based on this mental model, they form a judgment of which operations should be fast, and are expected to be slow. This mental model says that editing one cell in a huge spreadsheet should be fast, while editing all (say changing the color) can be slow. To avoid being perceived as unresponsive and second-tier, your algorithms need to be proportional to the edit size, not the document size. For example, in Excel, when you edit a cell, formulae could cause millions of cells to change, in the worst case. Excel identifies only the cells that are visible on screen (which is bounded) and recalculates them. This is an example of an incremental approach.
• Algorithms that produce a bounded output given unbounded input: Excel has a visualiser that can take in cells as input and returns a geometry consisting of rectangles, lines, other shapes and text that constitute a graph. This geometry is then used in multiple ways: It can be rendered. It can be themed. When you hover over one of the elements in the graph, it can respond to your hover. You can select one of the elements. Excel can animate from one graph to another. There are accessibility hooks:
The user might want to generate a graph based on a huge data set, like a million rows. So the input to the visualiser has to be unbounded in size. But the output will always be bounded, since you can’t read a graph with thousands of elements.
When you have an unbounded data set — millions of cells in a spreadsheet, hundreds of pages in a Word document — the component that converts it into a bounded data set is always a key element of your design. If you consciously identify such a component, you can then invest in automated testing to (e.g.,) ensure that its memory consumption is bounded when processing an unbounded data set. This is something that you should think about for any component that processes unbounded input. And automating it ensures that this is maintained over years as engineers who may not understand the above change the code. The design of components that process a bounded set of data can be simplified. For example, Excel clones the geometry and hands off each copy to a separate thread for faster performance. It can do this only because the geometry is bounded — you can’t clone millions of cells in a spreadsheet. By separating out components that process bounded input from those that process unbounded input, you can apply the appropriate design techniques to each.
• Faster hardware can make your app slower: When hard disks increased in capacity but not in random IOPS1, app developers that took advantage of increased space found their apps to be slower. If your app stores 20 MB on data on a hard disk that can read at 20MB/s, it loads in 1s. If the hard disk size doubles, and so your app’s data expands to 40 MB, it now takes 2s to load.
Software developers need to remember that different components of hardware progress at different rates. Latency, for example, hardly decreases. Faster hardware may make your app slower.
The Compositor Design Pattern
• Recognise that you can’t guarantee the performance of an entire, complex app like Word or Excel that might have loaded a huge data set like a document hundreds of pages long, or a spreadsheet with million of cells. Instead, the only thing you can do is define a subset of functionality whose performance you guarantee. In this example, this is the compositor: it composites multiple layers like text and graphics onto screen. We’ll make the compositor run at 60 FPS.
• The compositor needs a dedicated thread 2.
• The compositor needs to be given the single responsibility of keeping the UI updated at 60 FPS. If it’s tasked with too much, it can no longer guarantee responsiveness. Whatever information is needed for the single responsibility should be given to it ahead of time, so that as the user interacts, the compositor is able to make decisions locally without blocking on other threads in the critical path. In this case, the information needed is the outlines of all the objects in the document, so that the compositor can hit test. If the user tries to zoom a graphic, the compositor optically scales it by stretching the bitmap. And, in parallel, invokes the main app in another thread to render a higher-resolution version, which is swapped in when it’s available. This is necessary since the rendering can take longer than 16ms. And the compositor shouldn’t do this rendering itself, since the compositor’s responsibility is to composite layers together, not to render each of them.
• The compositor thread shouldn’t block on other threads, such as by locking. Inter-thread communication, both in and out, should be via an asynchronous queue. This is an example of a design that makes blocking impossible. Don’t use a bunch of locks and try to be super careful about acquiring and releasing each lock — you’ll never get this right. Your app will be buggy and crashy. Instead, making blocking impossible by design.
• This means that some information is duplicated in the compositor, specifically the outlines of objects. While duplication is a code smell, it’s the only way to avoid locks.
Which includes the effect of latency.
If it shares a thread with other activities, those other activities might block the thread when the compositor needs it.