Nobody likes to fail. Such a simple word implies so many negative connotations: Laziness, incompetence, unprofessionalism, and more. With that in mind, it’s interesting to consider the fact that Unit Tests are all about failure!
Fail Early, Fail Hard
You might be scratching your head at this point, wondering what I mean by that. Aren’t unit tests supposed to pass? As usual, the answer is a bit deeper than the surface. Much like the canaries that miners used to carry with them to detect noxious gases, unit tests serve as an early indicator that something is wrong. In a traditional TDD workflow, failing tests are a normal occurrence (in fact, with strict TDD, you start with a failing test). The test runner makes failures clearly visible after each test run. This is a good thing because it reduces the likelihood of overlooking a failing test case.
In fact, unit testing’s primary value comes from failing tests, detecting bugs as early as possible in the development cycle. Multiple studies have shown that the cost of fixing software bugs rises dramatically the later the defect is discovered. To illustrate, let’s look at a hypothetical example, imagining a team working on a commercial software product when a bug is introduced:
Not-So-Bad: No Bug Leaked to QA
Let’s say an engineer is actively coding a feature and introduces a bug. The engineer sees a failing test, realizes his mistake, and fixes the bug before committing to SCC.
- Total Cost: Maybe 10 minutes of one engineer’s time
- Resolution Time: Immediate – if a bug doesn’t leak, it technically never existed 🙂
A Bit Worse: Bug Leaks to QA
Instead of the prior scenario, let’s imagine that the developer does not realize his mistake and the bug leaks to the QA team. QA discovers the bug during daily testing, writes up a bug ticket, and sends it into the queue for resolution.
- Total Cost: Maybe 1.5 hours of QA time (reproducing and documenting the bug, writing up support ticket), another 10 minutes of Project Manager time for reproduction, triage, and assignment, along with however long it takes the engineer to resolve the issue.
- Resolution Time: Probably in the scale of 1-3 days due to the feedback loop between dev and QA
Substantially Worse: Minor Bug Leaks to Production
Let’s imagine a slightly worse case: Nobody notices the bug until it leaks out into production and becomes visible to customers, but it’s inconsequential – something like a typo. Now our resolution loop extends to include the customer, support staff, QA, project management, and engineering, and the anticipated resolution time is likely months out because changes have to go through release validation and get pushed to the customer.
- Total Cost: Now it’s getting harder to determine…
- We’ve now gotten the support team involved because they are the customer interface – maybe 30 minutes of one technician’s time. Depending on how many customers report the issue, this may increase drastically.
- We still have similar QA and engineering burdens. In this case, however, it will probably take longer to locate and resolve the issue because the code isn’t ‘fresh’ in the engineer’s mind.
- Project management will also have to get involved, possibly shifting other work planned for the release or changing the release date.
- We look kind of stupid to our customers, so there will probably be minor brand damage.
- Resolution Time: Weeks to Months, may impact release dates
All-Out Catastrophe: Major Bug Leaks to Production
And just for fun, let’s go nuclear… Imagine the production bug scenario above, but instead of being a typo, the code reinitializes all connected databases with test data at application startup. Since this is our scenario, let’s also imagine that it is a customer-premise solution and many clients are less than diligent about backups.
- Total Cost: Almost immeasurably difficult to determine
- Destroyed customer premise data? Hopefully you have a team of lawyers and they’re all warmed up!
- Customer goodwill? Kiss it goodbye!
- Hello Negative Press!
- Resolution Time: Years to Never
Jeez Man, Don’t You Ever Have Happy Scenarios?
Of course! In fact, I think the first scenario above is about the happiest possible thing! Unit Tests identified that a bug was introduced. The engineer exercised his professional responsibility and fixed it, and the world happily moved on. That’s really my whole point here – unit tests are intended to break when we introduce bugs. Their selfless act of failure notifies us of trouble before it escapes to wreak havoc in production.
PS: For those who are wondering, yes, the title is a Benjamin Franklin reference 🙂