Summary of The Software Architect Elevator
by Gregor Hohpe.
Architect
The word architect derives from the Greek architekton, which means a master builder of houses and other buildings. Note that it’s “builder”, not “designer”. Drawing pretty pictures isn’t enough. Architects need to see the consequences of their decisions, so that they get feedback and can improve. Otherwise, it’s authority without responsibility — authority to make decisions without the responsibility for their consequences. This doesn’t work.
Architects
… support the business strategy.
… connect the dots, to avoid a situation of having components that are well thought-out and well-built but don’t add up to the outcomes needed.
… bring tradeoffs to the team’s attention.
… avoid overcomplicating things.
Both architects and developers handle requirements. But they do it differently: Developers handle functional requirements like what features the app should have, how they should work, the UX, etc. Architects handle non-functional requirements like scalability, reliability, maintainability and security. These are called -ilities. Architects also have to ferret out unstated requirements. These can come from context. For example, banking software needs to meet certain regulatory requirements, and non-technical people may not think it necessary to tell developers. Requirements can also arise from dependencies. You may have decided to use certain core banking software, which means the rest of your system needs to be compatible with that. You can’t build a system without making assumptions. They just need to be explicit, and making them explicit is one of the architect’s most valuable contributions.
Complex architecture feels organic. Architects should use a soft touch. Imposing top-down dictatorial “governance” does more harm than good.
The Architect Elevator
A company is like a building with many floors, from the engine room at the bottom (eng) to the penthouse (CXOs). These floors are populated by different functions, who have are tasked with different goals. They look at the world differently, creating a situation like the blind men and the elephant. When these different floors talk to each other, they speak different languages, so can’t understand each other. They have different fears. An architect should ride the elevator up and down to talk to people in different floors to unify them all into one company rather than each floor going in its own direction.
When the elevator isn’t used enough, it causes many problems. One is the leadership being under the impression that the “digital transformation” is proceeding nicely, while the folks in the engine room are doing something that doesn’t add up to produce digital transformation. Such organizations are like the Leaning Tower of Pisa — the penthouse and the engine room are not vertically aligned, so you can’t ride an elevator to go up and down.
The architect must speak to different floors in different ways. For example, he might tell the engineers in the engine room to “use RDS Multi-AZ with two standbys” and inform management that “he made a decision to reduce data loss and increase availability and performance for a extra cost of $1000/month.” Management often isn’t told the consequences of technical decisions. When they ask engineers, they get highly technical answers talking about the transaction log and AZ failover, which they can’t make head or tail out of. Nor are they interested in — if they were, they’d have become engineers themselves. A variant of unintelligible messages is biased messages: as information travels up through floors, it becomes a game of telephone, where the message is distorted at each stage by people injecting their favorite messages and project proposals irrespective of technical merit, and hiding bad news. So management often makes decisions without technical knowledge. If you were handed down a dumb decision and wondered why management made such an illogical decision, this is why. An architect who can prevent that will go far.
Some people ride the elevator only in one direction — up. They say things like “I used to be technical”. These people often have an inflated view of their (now obsolete) technical skills and can end up eating caviar while the engine room is flooded. We need to ride the elevator down to understand the consequences our decisions have produced, and thus obtain feedback to improve ourselves.
Some businesspeople ride the elevator down merely to pick up buzzwords to sell as their own ideas in the penthouse: “We should do what I say because Kubernetes operators optimize machine learning workloads on ARM servers in the delta space.” Try to get them genuinely interested in what’s happening in the engine room or, if you can’t do that, ignore them.
As business speeds up, companies have to adopt new technology faster, and respond to changing market conditions faster. Companies with the above dysfunctions will be outcompeted. They need a working elevator to be nimble enough for today’s world.
Designing for Change
Legacy IT people fear change, as embodied by mantras like “never touch a running system”. Change is permitted only when packaged into a project, which is limited by various controls, including technical approval, budgetary approval, exhaustive requirements and planning documentation, and further gates before deployment. In other words, the assumption is constancy, with change consisting of intermittent “projects”. This made sense earlier, when Windows NT 4 was upgraded after four years.
In today’s agile world, teams are expected to launch at least weekly. Since change happens every day, bureaucracy that throttles change throttles progress. Support change instead.
When you defer change into yearly releases, it increases risk, because there are tons of things that can go wrong. Instead break them into chunks and tick them off the list. Consider deploying continuously. If it hurts, do it more often.
Speed is impeded by:
Unnecessary dependencies.
Bureaucratic processes like weeks to provision infrastructure.
Lack of automation, like deploying manually by SSH’ing to each server and copying the .war. Doing it manually takes so much time and risks so many mistakes that you don’t do it often.
Not maintaining a minimum level of quality: you can’t continuously deploy if the code is full of bugs.
Fear.
Rigid architecture. Ideally, architecture should be tunable. For example, with RDS, you can declaratively enable a slave to reduce data loss and increase availability. If you also want to reduce latency, you can enable a second slave. Architecture should have knobs like this that you can turn to get more or less of something.
System Behavior
A team is a system, and principles from systems thinking apply to it. Before we improve a system, we need to understand what’s happening. Non-technical people sometimes have trouble inferring what’s going on inside the system from its external behavior. For example, the external behavior might be buggy software, but the cause may be too many tasks forced onto engineers each week. Until you understand the causes, you’ll treat symptoms ineffectually.
Organizational systems have settled into a steady state over time, whether or not that works. If you’re hired to fix a company that’s not working, the system will resist the change and try to revert to its previous state. When you try to push a car out of a ditch, the car keeps rolling back. Until you get it over the hump.
Another example of lack of understanding of system behavior is non-technical people not scheculding any time towards architecture or tech debt and then wondering why delivery isn’t happening well. This is like not using engine oil and then wondering why your car broke down.
Architecture Review
If you’re conducting an architecture review and you see an architecture diagram, identify what decisions it embodies. For example, when you’re building a house, a highly sloped roof is designed to let snow slide off. In other words, the house is being designed for a cold climate. That’s an explicit decision. By contrast, a house designed for Bangalore does not need to deal with snow. That’s an example of a decision. On the other hand, saying that the house has a window is not a decision — every house has a window. So, when conducting an architectural review, identify the non-obvious decisions it embodies. Meaningful decisions have upsides and downsides. Architecture is not good or bad, but good or bad for a particular purpose, just as a house is not good or bad; it’s good or bad for Iceland.
As the Oracle in The Matrix says, “You didn’t come here to make the choice, you’ve already made it. You’re here to try to understand why you made it.” An architecture review is a meeting with the Oracle, where we understand why we’re doing what we’re doing.
Don’t look at approval meetings as a nuisance; look at them as an opportunity to gather feedback.
Architects should periodically play with technology. Playing knows no fear and no judgment. Many people are worried, “If I play with low-code and it doesn’t work out, will I look bad?” Ignore the social norms and the pressure to always be productive and appear productive. Just as all work and no play makes Jack a dull boy, lack of experimentation dulls architects’ minds.
Migrating from on-premise to the cloud doesn’t save cost; it increases flexibility and productivity.
We architect our systems well to reduce failures or, in other words, to increase Mean Time Between Failures (MTBF). We should also reduce the Mean Time To Recovery (MTTR), by version control, automation and monitoring. Reducing MTTR is a game-changer.
Make your infrastructure immutable, by recreating it from a recipe, rather than applying changes to existing servers.
As software eats the world, there will be two kinds of people: those who tell the machines what to do, and those who’re told by machines what to do.
High velocity requires:
Fast coding, which requires a codebase with tech debt under control.
Confidence in code changes using various tools like code reviews, automated tests and frequent releases.
Reliable deployment. This means it should be automated, not manually ssh’ing to each server and copying files.
Scalability to handle more users.
Monitoring to spot issues early.
Digital companies prioritize time to market, and legacy companies prioritize cost-cutting.
If you found this interesting, and want to continue exploring this topic, read Bringing about Organizational Change or Organizational Red Flags.