Summary of The Mythical Man-Month
I have been hearing about the book The Mythical Man-Month for a long time, long before I had the maturity to understand it.
I finally read it, and I’m amazed by how much of what we think we learnt today was already learnt in the previous generation. They were lacking in technology, but not in skill or ability to successfully deliver more sophisticated projects than most of us do today.
Here’s a summary of the book:
Chapter 1: The Tar Pit
- Large programming projects suffer management problems different in kind from small ones.
- An animal caught in a tar pit can’t escape. It may be strong and try for a long time, but eventually the tar pit wins. Large projects also tend to get caught in a tar pit.
- There isn't one cause for large projects to enter the tar pit.
- We read about two entrepreneurs building a better product than a big team. Why, then, haven't all big teams been replaced by two-person teams?
- It's easy for a developer to build a program that runs on his system with the input data he has, has a specialised algorithm that works only for his use case, and without documentation, thorough testing or maintenance. But making a product requires taking care of all the above, which is at least 3x the work.
- Instead of a single program, if you want to build a system of programs that work together, generate outputs that other programs can consume as input, and are thoroughly tested in all possible combinations with other programs, it costs at least 3x as much as a single program.
- Combining the two, a system sold as a product costs at least 9x as much as a single program.
- What are the joys of programming as a craft? What delights may a programmer expect? a) The joy of building things b) The joy of building things useful to others c) The joy of making complex interlocking puzzles d) The joy of working with a medium that has no limitations, unlike other fields like civil engineering where the materials have limited strength. e) Building things that move and work, unlike a poet.
- What are the challenges of programming as a craft? a) It requires perfection, which few other human endeavours demand. b) The programmer’s objectives, resources and information are not under his control, but under management’s. c) Dependence on programs that are closed-source, poorly-designed, -implemented or -tested. d) The creative and fun activity is balanced by hours of tedious work like bug finding.
Chapter 2: The Mythical Man-month
- More software projects have gone awry for lack of calendar time than for all other causes combined.
- Why? a) Estimation techniques are under-developed, especially in assuming all will go well. b) People and time are not interchangeable. c) Software managers are not stubborn in pushing back against management. d) Schedule progress is poorly monitored, and techniques proven in other fields are not adopted in software. e) When a project is delayed, more people are added, which is like dousing a fire with petrol.
- Let’s look at these one by one. We tend to be optimistic about our ideas, but their incompleteness and inconsistencies becomes clear only during implementation.
- More the tasks in a project, more the chance that one will slip, delaying the entire project. Big projects are guaranteed to be late.
- Doubling head count halves time only if there’s no overhead. This overhead comes in two varieties: a) Training in technology, goals, and how the team is being run. This can’t be partitioned; it increases linearly with headcount b) Communication, which is quadratic, or worse considering multi-person meetings.
- Because of all this, adding people causes the schedule to shorten at a lower rate than the ideal 1/N, and actually lengthen beyond a point.
- To schedule a software task, allocate time to a) Planning b) Coding c) Debugging
- The parts of a task that are easy to estimate end up taking less time.
- Cutting testing time because of schedule pressure results in unplanned delays instead of planned delays. This is worse.
- If an omelette is promised in two minutes but doesn’t arrive, turning up the heat will produce an omelette half burned and half raw.
- Making unreasonable schedules to meet management’s desired date is more common in software than other engineering fields because we have no hard data, only hunches.
- Software engineers should share more data about productivity, bug frequency, estimating rules, and so on.
- Managers should stiffen their backbones and defend their hunches, since they’re better than wishful thinking on the part of management.
- When a project is delayed, trim the task and try to stick to the timeline.
- Adding manpower to a late project makes it later.
- You can take a given schedule and lengthen it by reducing manpower, but not the other way around.
Chapter 3: The Surgical Team
- High performers are 10x more productive than low.
- A small team is one that has only one layer of management (max 10 people).
- Big teams are unproductive and build software without conceptual integrity compared to small teams. But big teams are necessary for big projects. Otherwise, an OS can take a decade, which is too slow.
- All programs and data should be team property, not individual property.
- A team should be organised as a surgical team with defined roles like surgeon, nurse and anaesthesiologist, not as a hog-butchering team where everyone takes a stab at the hog. Roles could be architect, advisor, tester, documenter, toolmaker, admin, etc.
- Everyone is not equal.
- This reduces communication overhead: each person needs to be informed only of things relevant to his work, not everything.
- A test found this to work phenomenally well.
Chapter 4: Aristocracy, Democracy, and System Design
- Many cathedrals are built over generations in different styles based on changing fashions and by builders bringing in their personal style and trying to “improve” what was been done before. The resulting mishmash proclaims the pride of the builders as much as the glory of God.
- On the other hand, the great Reims cathedral stirs joy in the beholder. This comes from unity of design: 8 generations of builders sacrificed their personal preferences to respect the design already in place. The result proclaims the glory of God and his power to salvage fallen men from their pride.
- Programs are worse than cathedrals: they degenerate quicker than centuries.
- Conceptual integrity is the most important consideration in system design. Don’t have many good but uncoordinated ideas. Sacrifice anomalous improvements and features.
- Adding functionality is a gain only if outweighed by the cost of searching, learning and remembering features (simplicity).
- These two need to be considered together. It doesn’t make sense to say “My system is better because it has more functionality” or “My system is better because it’s simpler to use”. The right question to ask is “How simple is the program for a given level of functionality?”
- A language can be too simple to the point where programs become needlessly complex to work around the limitations of the language.
- To ensure conceptual integrity, have an architect who isn’t burdened by implementation.
- Have an advisor who’s available to discuss, brainstorm, propose alternatives, validate, etc. The advisor is slightly junior to the architect, and the architect is the one who’s ultimately responsible.
- Architecture is everything visible to users: UI, manuals, inputs, outputs, options etc.
- Splitting the architect role among too many people waters down conceptual integrity.
- Trying to rush implementation without architecture can backfire, such as taking a year more in order to save 3 months of idle time.
Chapter 5: The Second-System Effect
- The architect and the head of the implementation team need to work together.
- The architect can propose a design and ask the implementor for an estimate.
- If it’s too high, the architect can either water down the design or challenge the estimate. The latter is emotion-generating, since the architect is questioning the implementor in his area. In such conversations, the architect should remember that he’s only suggesting, and the implementor’s word is final when it comes to estimates.
- The architect should be ready with a suggested implementation, but accept any other that meets the requirements.
- The architect should disagree in private.
- The implementor may suggest changes to the architecture and the architect should be open to them: small implementation changes may reduce cost a lot.
- The first system one architects one tends to be sparse, because the architect knows he doesn’t know a lot. But the second system tends to be elaborate, causing the project to fail, or overrun time and cost, or deliver something that users use only half of.
- Elaborate architectures may also amount to refining something that has become obsolete – the last and finest of the dinosaurs.
- Over-optimisation for the wrong thing results in an architecture that makes it hard to optimise for the right thing, which isn’t known in advance.
Chapter 6: Passing the Word
- When defining a system, it’s important to not define everything. Leave room for implementation flexibility or differences between systems in a family.
- We need meetings where people propose problems or changes (that have been circulated ahead of time in writing) and the group brainstorms them. Focus on creativity, rather than decision.
- For these meetings to be useful, the same people need to attend them week on week so that they have context. And if a consensus doesn’t emerge, the chief architect decides.
- To have a real spec, you need multiple implementations.
- Every dev team needs a test team, which is independent, and a proxy for the customer, who is a harsh tester.
Chapter 7: Why Did the Tower of Babel Fail?
- The Tower of Babel had a clear goal, no time or raw materials or manpower constraint, and adequate technology. It failed because when people could no longer understand each other, coordination ground to a halt.
- Huge projects need to maintain structured documentation.
- You can’t share every piece of information with every engineer — that’s too much.
- Engineers need to understand interfaces offered by other teams, and implementations only as and when needed, such as when the abstraction leaks.
- The org structure should be a tree because no man can serve two masters.
- But the communication structure shouldn’t be restricted to a tree.
- Organisations must be designed around the people available, rather than fitting them into a pure theory org.
- Big teams need a separate tech lead and manager for two reasons: one, each is a full time job, and two, it’s hard to find one person good at both.
- The manager must proclaim the tech lead’s authority and back it up when challenged.
- The manager must respect the tech lead’s tech skills, align on tech strategy, hash out differences in private on issues before they become timely.
Chapter 8: Calling the Shot
- Don’t estimate a task by estimating only the coding part and then scaling it up to include design and debugging. This will scale up the errors, too.
- Data for building isolated small programs is not applicable to programming systems products.
- The effort is proportional to the program size ^ 1.5.
- Huge programs take so long to build (the graph shoots up so much) that the project is left incomplete.
- Engineers must remember that only half the time is available for coding and debugging. The rest goes into other work, meetings, bureaucracy, sickness and personal time.
- When all this is accounted for, estimates matched reality.
- Both the programming rate and debugging rate are lower than estimated.
- Productivity is constant in terms of lines of code. If you use a higher level language that lets each line do more, you’ll be more productive.
Chapter 9: 10 Pounds in a 5 Pound Sack
- Size has a cost.
- The builder of a programming systems product should set goals in program size, control the size, and devise size reduction techniques.
- Like any cost, size is not a problem; unnecessary size is.
- A project manager should save himself an emergency reserve of size to be allocated if needed.
- When a constraint is suddenly removed due to technical progress, like instant access with disks vs slow access with tapes, builders overuse it, resulting in a slower system than before.
- A performance simulator is critical for a huge project to flag critical performance problems that would otherwise force a redesign at the end.
- Throughout implementation, architects must maintain vigilance and effective communication so that the intended design is built, and that engineers don’t see themselves as contestants focused on local optimisation at the expense of global.
- You can trade off space for time over an amazingly wide range.
- Representation of data is more critical than algorithms: show me your flowcharts and hide your data structures, and I’ll continue to be mystified. Show me your data structures, and I won’t usually need your flowcharts; they’ll be obvious.
- An incredibly small interpreter was built, for example, by realising that most of its code had common patterns which could be extracted out into an interpreter for the interpreter.
- You can build a plug-in system to add functionality. But if you’re going to need it anyway, it’s cheaper to build it into the system.
- You can build a scalable system, but scalable only within limits. Scaling a tiny system to a huge size will produce worse results than building a huge system from day one.
- Managers should get their people trained, including spreading knowledge in the team. If one engineer learnt to make a time-space trade off, this knowledge should be shared.
- Breakthroughs are made by designing a fundamentally novel system, not taking an existing system and iterating on it.
Chapter 10: The Documentary Hypothesis
- Projects can have tons of documentation, but a few of those are key.
- Engineers promoted to managers think these documentation and processes are a distraction from the real work.
- Documentation serves multiple purposes: as checklists, as places to track and share status, and to drive decisions.
- A budget forces tech and policy decisions that would otherwise be avoided. Don’t think of it as an annoying constraint.
- Marketers and management can keep flip-flopping, and engineering managers need to serve as a giant flywheel with inertia to smooth out these wild swings.
- The org chart becomes intertwined with interface specification. If you have three sub-teams, you have three subsystems with interfaces between them.
- Your first system design won’t be right. To change it, the org should change, too.
- Only when you write decisions down do you see gaps, inconsistencies and hundreds of micro-decisions.
- Written communication is necessary. Managers are amazed to see decisions they took for common knowledge totally unknown by someone.
- Reviewing documents periodically tells you where you are and if you need to make changes.
- 80% of an executive’s time is spent on communication: hearing, reporting, teaching, exhorting, counseling, encouraging.
- For a team to succeed, there needs to be a crisp direction that everyone has understood, and only written communication can do that.
Chapter 11: Plan to Throw One Away
- You need to try something. If it fails, admit it frankly and try another. But, above all, try something.
- Chemical engineers know that you can’t directly scale up a laboratory process to a factory. You need an intermediate step of a pilot plant. If the goal is to desalinate a million liters of water a day, first do 10,000.
- Software engineers haven’t learnt this: they work on a schedule that demands the delivery to customers of the first thing built. But the first version is often too big, too slow, and too awkward to use. You need to discard and redesign either the whole system or some components, smarting but smarter. It happens in every big project.
- The management question is not whether to build and throw away a version. You’ll do that anyway. The question is whether to do it in a planned manner, or an unplanned one where your timelines and reputation are affected.
- Welcome change as a way of life, not an untoward annoyance.
- The programmer’s job is to satisfy users’ needs, not build a tangible product. Both users’ needs and their perception of them will change over the course of the project.
- Tangible products restrict expectations. No one will expect their car to work in a river. But the intangibility of software and that it’s changeable expose builders to continual changes in requirements.
- While we should entertain only requests above a threshold, some changes in objectives are inevitable, along with changes in development strategy and technique. Be prepared instead of assuming they won’t happen.
- Versions should have a freeze date after which changes go into the next version. If not, nothing will ship.
- One approach is to treat all plans, milestones and schedules as tentative, but that is too open-ended.
- People won’t document if you have a culture where people criticise their documented decisions, since this is perceived as a threatening culture.
- Designing an organisation for change is harder than designing a system for change.
- People should be assigned to tasks that broaden them, so that the whole force is technically flexible.
- Big projects should keep 2-3 top programmers flexible to help whichever sub-team is having the most problems.
- Companies often have dual ladders of ICs and managers, and pay the same. But perks like administrative assistants and office sizes should also match. Switching from the IC to manager ladder should not be accompanied with a raise and it should be announced as a reassignment, not a promotion. Transfers in the other direction should come with a raise to overcompensate for cultural norms.
- Managers should periodically be sent to technical refresher training, and senior ICs to management training.
- When talents permit, both sides should be kept technically and emotionally ready to fill in for the other, such as managers coding and ICs managing a team.
- Senior people should not be deprived of the creative joy of hands-on work. They shouldn’t be made to feel it’s demeaning.
- When organisational changes are necessary, assign a whole surgical team to a new area.
- Maintenance costs at least 40% of the development cost.
- Surprisingly, maintenance cost is strongly affected by the number of users, since more users will find more bugs.
- When users start using a new version, they’ll find a lot of bugs, which will decrease over time, but increase after a while. This is believed due to users arriving at a plateau of sophistication, at which point they fully exercise the software. If you draw a graph of the number of bugs found against time since release date, it will look like a U whose right part is lower than its left.
- A bug fix has a 1-in-4 to -2 chance of introducing a new bug.
- Partly because of non-local effects, partly because of the repairer being a different person from the implementer, and partly because of the repairer being junior.
- Bug fixing requires more testing than any other kind of programming.
- Ideally after every bugfix, we’ll run all the accumulated test cases from earlier, but that’s very costly.
- To reduce maintenance costs, we should design software to eliminate or at least illuminate side effects.
- Repairs tend to erode the structure of the system, increasing its entropy. Over time, more and more effort goes into fixing flaws introduced by earlier fixes.
Chapter 12: Sharp Tools
- Managers should set aside resources to build tools used by the rest of the team.
- Engineers need to iterate without being delayed between iterations.
- To build a new machine, we need a simulator which is dependable. It can fail in some ways, as long as it’s predictable.
- When hardware is built, it’s unreliable, and trying to get software to work on unreliable hardware is hard. Software engineers are discouraged to look for bugs in their code because they don’t know if it’s an intermittent hardware bug.
- Every engineer should get an individual playpen where they can do whatever they want. Once a feature is working in their personal playpen, they can integrate it into a common area, after which only the integration manager can authorise changes. Then it’s released to the team as a whole, after which only critical bugs can be fixed. This level of control and formal progression between stages is important for both code and documentation changes.
Chapter 13: The Whole and the Parts
- A tested set of components doesn’t automatically assemble into a dependable system.
- Harmful and subtle bugs arise from mismatched assumptions made by authors of different components.
- Conceptual integrity helps reduce bugs.
- Modularity helps reduce bugs by clarifying what each module is supposed to do, so that bugs are apparent. And by letting you test each module independently and earlier than when the entire system is ready.
- Sometimes engineers don’t test modules independently, instead preferring to assemble untested modules together into a system and test against each other. That way, they want to avoid the cost of building testing harnesses for each module. But these harnesses save more than they cost.
- One kind of test code is a harness that talks to a module under test by presenting an interface the module expects.
- Another form of test code is building a dummy component, which returns fake data that adheres to the contract. For example, a sort function that returns fake data in ascending order.
- Another form of testing is a dummy file that ensures that the system is processing files correctly.
- Another form of test code is generators for input data and validators for output data.
- It’s not unreasonable for such test code to be half the product code.
- For testing to work, people should be prevented from making uncontrolled changes to the code being tested.
- If you’re testing, and found a bug, sometimes you put in a temporary fix to allow the testing to proceed. That’s fine as long as you track it and eventually replace it with a permanent fix.
- If you need to add multiple components to a system, add one at a time, thoroughly testing each time. Impatience and optimism tempt us into taking shortcuts that ultimately produce more work.
- If a component builder delivers a major new version of a component, say a rewritten version, test it as thoroughly as you’d test a new component.
- When multiple teams are working on different components, each team makes changes to their component that destabilise the system as a whole. Quantise those changes so that the instability happens at discrete points when new versions are integrated, rather than all the time. Short but intense instability is better than constant but mild instability.
Chapter 14: Hatching a Catastrophe
- When a project slips disastrously, it’s usually not a single catastrophe; it’s a bunch of small ones. Think termites, not tornadoes: key people being sick, family issues, snow, power failure delaying work, emergency meetings with customers…
- Calamities are actually easier to handle: respond with major force, reorg, or invent new approaches. The whole team rises to the occasion.
- Estimation requires experience.
- Milestones must be concrete and measurable like “Passes all test cases” or “Specifications signed by both architects and implementers”. Not “90% done” (projects are 90% done for months) or “planning complete” (you can say that planning is complete whenever you want to!)
- Vague milestones incent people to lie, and it means that the boss understands something different from what’s told.
- Understanding which items are on the critical path, what the sequence of dependencies are, and how much an item that’s not on the critical path can slip before it is on it, are important.
- Sports managers recognise hustle, the characteristic of great players to run faster than necessary, move sooner than necessary, and try harder than necessary. This is necessary for software teams, too, to build up a reserve that can be deployed when needed.
- If a manager of managers reacts negatively to bad news, like pre-empting the line manager’s job of handling the problem or panicking, he won’t hear of problems. Second-level managers should distinguish between being asked to help and being given a status update. Label each meeting as one or the other and stick to what’s expected based on the meeting type. If needed, call a second meeting to switch from status update to problem solving mode.
- Second-level managers must never solve problems the first-level can.
- Distinguish between a date estimated before the start of the project and the current best estimate based on what we know today now that the project is mid-way through. They’re two different things, and tracking the difference helps keep the project on track.
- The big boss of a big project may need an assistant who periodically goes around asking each line manager if an update to the estimate is needed and if so, what the updated date is. This assistant has no authority to tell people what to do, just collect information. But this is still invaluable.
- There are some people whose role is inherently that of an irritant to others, but people who do this job unobtrusively and diplomatically are widely respected, not just tolerated.
Chapter 15: The Other Face
- If the user of a program is remote from the author in either time or space, documentation is important. Haven’t we all cursed the author of a poorly documented program?
- Show new programmers how to document. Don’t just tell them to do it, because they won’t know how, and you’ll mistake it for lack of interest or discipline.
- Documentation needs to start with an overview, like the purpose of the program, inputs and outputs. A lot of documentation has insufficient overview.
- There are three types of users: casual users, users who depend on the program and people who’ll modify the program.
- They need different documentation.
- Users who need to depend on a program (rather than just use it casually) need test cases they can run to verify it’s working correctly.
- Test boundary conditions, both the highest valid value and the next higher one (the lowest invalid value) to ensure proper error-handling.
- Single-page flowcharts that give an overview, or describe the high-level steps, are very useful. Detailed ones that overflow to multiple pages are not useful.
- Maintaining documentation outside the source code means it will get out of sync. Instead, incorporate the documentation in the code.
- Some programming practices impose too much burden, which our predecessors haven’t been able to bear, and so neither will we. Reduce this burden.
- Attach documentation to program elements that anyway need to exist (like functions).
- Code is often under-documented.
- Rigid coding standards drive people to over-document.
- Even then, unimportant things are documented in great detail, while the necessary parts (high-level overview) are under-documented.
- Document as you code, not later.
- Machines are made for people, not people for machines.
Epilogue
- The tar pit of software engineering will continue to be sticky.
- The human race will continue attempting to build systems at or just beyond reach.
- Building systems will require technical tools, management techniques, common sense, and humility to recognise our fallibility and limitations.