If your most critical engineer left tomorrow, how long would it take to recover? The answer tells you how resilient your organization actually is.
I’ve seen this play out. A key person leaves and suddenly everyone realizes how much was only in their head. The scramble that follows is painful and expensive.
One question I ask myself regularly: who’s my backup? Not one person who handles everything when I’m out. That’s overwhelming for them and fragile for the organization. Instead, I think about who can pick up different parts of what I do. The load gets distributed, and no single person becomes a bottleneck.
The Question Nobody Wants to Ask
“What happens if someone gets hit by a bus?” It’s a morbid way to frame it. In our office we joke about “what if they won the lottery?” as the friendlier version. Either way, the question matters. Every team has at least one person whose departure would cause serious disruption. They’re the only one who understands a critical system, or they hold all the client relationships, or they’re the glue that keeps everything running.
The bus policy isn’t about expecting disaster. It’s about building an organization that can absorb unexpected losses without falling apart.
Identify Your Single Points of Failure
Start by mapping out who knows what. For every critical system, process, or relationship, ask:
- Who is the primary owner?
- Who else could step in with minimal ramp-up?
- What documentation exists?
- How long would it take to train a replacement?
If the answer to “who else could step in” is “nobody,” you’ve found a single point of failure. And if the answer is only one name, you’ve just moved the single point of failure one step away. Look for ways to spread knowledge across multiple people.
Documentation That Actually Gets Used
Most organizations have documentation. Most of it is outdated, incomplete, or impossible to find when you need it.
Effective documentation isn’t about writing everything down. It’s about capturing the information that would take the longest to reconstruct: why decisions were made (not just what was decided), the gotchas and edge cases that aren’t obvious, who to call when something breaks, and the stuff that’s “in someone’s head.”
A good test: could someone new to the team use this documentation to solve a real problem? If not, the documentation needs work.
Cross-Training as Insurance
Documentation helps, but nothing replaces hands-on experience. Build cross-training into your normal operations.
Have people shadow each other on critical tasks, even if it feels inefficient in the short term. When someone takes vacation, split their responsibilities across a few people rather than dumping everything on one backup. Each person handles a piece, nobody gets overwhelmed, and more people gain exposure.
Spread on-call duties so multiple people learn how systems fail. The person who’s been paged at 2am about a database issue understands that system differently than someone who’s only read the runbook.
The goal isn’t to make everyone interchangeable. It’s to ensure that no single departure creates a crisis.
Shared Ownership Over Hero Culture
Some organizations accidentally reward single points of failure. The person who “owns” a critical system becomes indispensable. They’re the hero who saves the day when things break. This feels good in the moment but creates long-term fragility.
Healthy organizations distribute ownership. No one person should be able to hold the company hostage, intentionally or not. If someone’s value comes entirely from being the only one who can do something, that’s a problem to solve, not a trait to celebrate.
The same logic applies to backups. If your coverage plan is “Sarah handles everything when I’m out,” you’ve just made Sarah a single point of failure. Spread the load. Different people can own different pieces.
The Quarterly Review
Put this on your calendar every quarter: review your single points of failure list to see if anything has changed, check whether documentation is still accurate, confirm that everyone knows who covers what, and run a tabletop exercise where you pick a critical person and walk through what happens if they’re unavailable for a month.
This takes an hour or two per quarter. The cost of not doing it is measured in weeks or months when something actually goes wrong.
The bus policy isn’t pessimism. People leave, get sick, take new opportunities, or simply need a break. Building for that reality makes your organization stronger whether or not the unexpected ever happens.