Summary of Complexity and Strategy by Terry Crowley
These are the insights I got from Complexity and Strategy, by Terry Crowley, who led Office for a decade:
Cost Scales Non-Linearly with Functionality
People imagine that the cost (both financial and time) to build software increases linearly with functionality:
Unfortunately, it’s more like this:
Building double the functionality costs thrice, not double.
One reason is that features interact. Imagine your app supports tables, and users can set a background color for a column:
Now, suppose you decide to implement a new feature: merged cells. Does a merged cell take the color of the first or the second columns it’s in? You can choose first:
This looks odd, and breaks the rule that if you set a background color for column B, all cells in B will have the background color. Alternatively, the merged cell can have the color of the second column:
This looks odd, too — a cell in column A now has acquired a background color despite neither the column nor the cell explicitly having a background color set on it. Alternatively, you could ignore the merge for the purpose of setting a background color, and set the background color separately for the constituent cells in the merged cell:
This is confusing, because it looks like the cell has become un-merged.
As you can see, there’s no obvious solution to this problem. The two features — merged cells and column background color — interacted with either other. You can’t eliminate this problem by architecting your system better, because the complexity is inherent in the specification. It’s essential complexity. If you assume that each of the two features has a cost of 1, then implementing both in your program incurs a cost of 3. Each new feature you add to a complex program like Excel can interact with any of the existing features.
This is why building a complex program like Excel incurs a huge cost, which is much more than number of features x cost of each feature.
• This sets up a huge moat for Microsoft: any organisation that wants to build Excel will take a decade and a billion dollars.
• The worst feature you can add is one that’s not used or valued — its cost/benefit ratio is infinite. Continuous delivery prevents the team from adding such features, by providing quick feedback.
• If you want your team to remain productive, don’t add features that only 20% or fewer of your users will use. A team of three developers was able to build a multimedia document editor that included word-processing, spreadsheets, image, graphics, email and real-time conferencing. A small team can build a lot quickly if it takes an opinionated stance and says no to long-tail features.
• Sometimes, refusing to use a powerful building block is better in the long run. When building OneNote, the team considered whether to reuse Word’s powerful editing surface, which would have given them a lot of functionality on day 1 at lower cost:
It seems the best of both worlds — high functionality at low cost!
However, rebuilding an editor component let OneNote store multiple notes in one file, rather than being bound by Word’s one-file-per-document model. OneNote notes can contain different objects from Word documents. OneNote was able to implement sharing in a way that made sense for OneNote, rather than being bound by what made sense for Word. Over time, this decision to not reuse Word’s editor component let them build more functionality with lower cost:
The OneNote team never regretted their initial decision.
So, when you make architectural decisions, you can’t just decide based on the tradeoffs today (the first graph). You have to foresee how they play out over time (the second graph).
• People sometimes evaluate a new technology by building a sample app and conclude that it’s much more productive than the old technology. The productivity comes from a smaller codebase, not from the new technology. Then they go ahead and rebuild the main product in the new technology, only to find that its claimed benefits did not materialise. This is because they confused the starting point of the line with the slope of the line.
• In a massive codebase like Excel that lives for decades and generates billions of dollars of profit every year, third-party frameworks are not a big deal. They fail to evolve in the way Excel needs them, and the framework is a fraction of the amount of code in Excel. So frameworks tend to get absorbed into Excel. When someone says “Here’s a free framework that will help you”, it’s not like free beer which you enjoy, but like a free puppy that ends up demanding a lot from you. The mantra, therefore, is, “If you ship a third-party framework, you own it”. At Excel’s scale, they can rebuild any framework they want in a way that’s optimal for them, rather than a third-party framework author, who needs to make decisions that are right for the majority of their users.
• If an upstart wants to attack an incumbent, the most effective attack to mount is an asymmetric business model attack enabled by new technology. For example, the web as a platform enabled Google Docs. If Google tried to compete head-on with Office in features, they’d lose. So they competed successfully on other dimensions like being able to sign up in seconds without requiring installation or payment, being able to use it from any device, sharing, autosave and revision history… Uninformed journalists wrote breathless articles saying that Google Docs innovates faster than Office, but that was only because Google Docs has a fraction of Office’s features. Smaller apps always move faster, irrespective of technology like web vs native. As Google tried to build more, they became bogged down in the same complexity graph. And the drawbacks of the web resulted in offline not working even today. The initial architectural decisions Google took resulted in Google Docs evolving exactly as Microsoft architects predicted.
When Microsoft responded to Google by building editing web apps, they had to decide whether to continue to use the Office file formats, which would add enormous complexity to the task of porting them to the web. The cost is multiplicative (number of features x number of platforms), not additive (number of features + number of platforms). This cost could have been avoided if Microsoft used a different file format for their web-based editors. But the file formats are a critical moat for Office. Competitors like OpenOffice tried for decades to be compatible, with mediocre results. Microsoft didn’t want to create this incompatibility problem for themselves. And if Microsoft’s web-based editor did not have perfect compatibility with traditional Office, there was no reason for customers to use Microsoft’s web-based editors instead of Google Docs or some other competitor. So, Microsoft decided to pay the enormous price of retrofitting collaboration to the features in the Office XML format. The complexity is the moat.
• The multiplicative cost (number of features x number of platforms) is exactly why Google doesn’t build desktop native apps. The complexity would slow them down.
• Bill Gates wanted a core engine that had the union of features in all app: HTML rendering, document editing, spreadsheets, everything. Then each app like a web browser or Word or Excel would just be a UI on top of this core engine. But Bill was wrong, because such a component would have the complexity of all these products in one engine, which is beyond any company’s ability to build, since a HTML renderer and Excel and Word are individually extremely challenging to build for the best teams in the industry. This “do it all” component would collapse under its own weight, like a 2 km-high skyscraper. Better to build multiple buildings, each of a reasonable height.
• If you get all the above right, you’ll have a high functioning engineering team. Unfortunately, such a team doesn’t look like a breakthrough to outsiders. It will merely get a lot done :)