Why are unit tests failing seen as bad?

user619818 05/22/2018. 10 answers, 5.339 views
unit-testing

In some organisations, apparently, part of the software release process is to use unit testing, but at any point in time all unit tests must pass. Eg there might be some screen which shows all unit tests passing in green - which is supposed to be good.

Personally, I think this is not how it should be for the following reasons:

  1. It promotes the idea that code should be perfect and no bugs should exist - which in the real world is surely impossible for a program of any size.

  2. It is a disincentive to think up unit tests that will fail. Or certainly come up with unit tests that would be tricky to fix.

  3. If at any point in time all unit tests pass, then there is no big picture of the state of the software at any point in time. There is no roadmap/goal.

  4. It deters writing unit tests up-front - before the implementation.

I would even suggest that even releasing software with failing unit tests is not necessary bad. At least then you know that some aspect of the software has limitations.

Am I missing something here? Why do organisations expect all unit tests to pass? Isn't this living in a dream world? And doesn't it actually deter a real understanding of code?

10 Answers


Doc Brown 05/22/2018.

The misconception of this question is that it does not differentiate between local development branches, trunk, staging or release branches.

In a local dev branch, it is likely to have some failing unit tests at almost any time. In the trunk, it is only acceptable to some degree, but already a strong indicator to fix things ASAP. In a staging or release branch, failing tests are "red alert", showing there has been gone something utterly wrong with some changeset, when it was merged from the trunk into the release branch.

I would even suggest that even releasing software with failing unit tests is not necessary bad.

Releasing software with some known bugs below a certain severity is not necessarily bad. However, these known glitches should not cause a failing unit test. Otherwise, after each unit test run, one will have to look into the 20 failed unit tests and check one-by-one if the failure was an acceptable one or not. This gets cumbersome, error-prone, and discards a huge part of the automation aspect of unit tests. Note that failing unit tests can disturb the rest of the team, not just in a release branch, but already in the trunk, since they require everyone to check if not his/her latest change was causing the failure.

If you really have tests for acceptable, known bugs, use your unit testing tool's disabling/ignore feature, and add a low-priority ticket to your issue tracker so the problem won't get forgotten.


Phill W. 05/22/2018.

... all unit tests passing in green - which is supposed to be good.

It is good. No "supposed to be" about it.

It promotes the idea that code should be perfect and no bugs should exist - which in the real world is surely impossible for a program of any size.

No. It proves that you've tested the code as well as you can up to this point. It is entirely possible that your tests do not cover every case. If so, any errors will eventually turn up in bug reports and you'll write [failing] tests to reproduce the problems and then fix the application so that the tests pass.

It is a disincentive to think up unit tests that will fail.

Failing or negative tests place firm limits on what your application will and will not accept. Most program I know of will object to a "date" of February the 30th. Also, Developers, creative types that we are, don't want to break "their babies". The resulting focus on "happy-path" cases leads to fragile applications that break - often.

To compare the mindset of the Developer and the Tester:

  • A Developer stops as soon as the code does what they want it to.
  • A Tester stops when they can no longer make the code break.

These are radically different perspectives and one that is difficult for many Developers to reconcile.

Or certainly come up with unit tests that would be tricky to fix.

You don't write tests to make work for yourself. You write tests to ensure that your code is doing what it's supposed to do and, more importantly, that it continues to do what it's supposed to do after you've changed its internal implementation.

  • Debugging "proves" that the code does what you want it to today.
  • Tests "prove" that the code still does what you want it to over time.

If at any point in time all unit tests pass, then there is no big picture of the state of the software at any point in time. There is no roadmap/goal.

The only "picture" testing gives you is a snapshot that the code "works" at the point in time that it was tested. How it evolves after that is a different story.

It deters writing unit tests up-front - before the implementation.

That's exactly what you should be doing. Write a test that fails (because the method it's testing hasn't been implemented yet) then write the method code to make the method work and, hence, the test pass. That's pretty much the crux of Test-Driven Development.

I would even suggest that even releasing software with failing unit tests is not necessary bad. At least then you know that some aspect of the software has limitations.

Releasing code with broken tests means that some part of its functionality no longer works as it did before. That may be a deliberate act because you've fixed a bug or enhanced a feature (but then you should have changed the test first so that it failed, then coded the fix/ enhancement, making the test work in the process). More importantly: we are all Human and we make mistakes. If you break the code, then you should break the tests and those broken test should set alarm bells ringing.

Isn't this living in a dream world?

If anything, it's living in the Real World, acknowledging that Developers are neither omniscient nor infallable, that we do make mistakes and that we need a safety net to catch us if and when we do mess up!
Enter tests.

And doesn't it actually deter a real understanding of code?

Perhaps. You don't necessarily need to understand the implementation of something to write test for it (that's part of the point of them). Tests define the behaviour and limits of the application and ensures that those stay the same unless you deliberately change them.


VoiceOfUnreason 05/22/2018.

Why are unit tests failing seen as bad?

They aren't -- test driven development is built upon the notion of failing tests. Failing unit tests to drive development, failing acceptance tests to drive a story....

What you are missing is context; where are the unit tests allowed to fail?

The usual answer is that unit tests are allowed to fail in private sandboxes only.

The basic notion is this: in an environment where failing tests are shared, it takes extra effort to understand whether a change to the production code has introduced a new error. The difference between zero and not zero is much easier to detect and manage than the difference between N and not N.

Furthermore, keeping the shared code clean means that developers can stay on task. When I merge your code, I don't need to shift contexts from the problem I'm being paid to solve to calibrating my understanding of how many tests should be failing. If the shared code is passing all of the tests, any failures that appear when I merge in my changes must be part of the interaction between my code and the existing clean baseline.

Similarly, during on boarding a new developer can become productive more quickly, as they don't need to spend time discovering which failing tests are "acceptable".

To be more precise: the discipline is that tests which run during the build must pass.

There is, as best I can tell, nothing wrong with having disabled tests that fail.

For instance, in a "continuous integration" environment, you'll be sharing code on a high cadence. Integrating often doesn't necessarily mean that your changes have to be release ready. There are an assortment of dark deploy techniques that prevent traffic from being released into sections of the code until they are ready.

Those same techniques can be used to disable failing tests as well.

One of the exercises I went through on a point release was dealing with development of a product with many failing tests. The answer we came up with was simply to go through the suite, disabling the failing tests and documenting each. That allowed us to quickly reach a point where all of the enabled tests were passing, and management/goal donor/gold owner could all see what trades we had made to get to that point, and could make informed decisions about cleanup vs new work.

In short: there are other techniques for tracking work not done than leaving a bunch of failing tests in the running suite.


Robbie Dee 05/22/2018.

I view it as the software equivalent of broken window syndrome.

Working tests tell me that the code is of a given quality and that the owners of the code care about it.

As for when you should care about the quality, that rather depends what source code branch/repository you're working on. Dev code may very well have broken tests indicating work in progress (hopefully!).

Broken tests on a branch/repository for a live system should immediately set alarm bells ringing. If broken tests are allowed to continue failing or if they're permanently marked as "ignore" - expect their number to creep up over time. If these aren't regularly reviewed the precedent will have been set that it is OK for broken tests to be left.

Broken tests are viewed so pejoratively in many shops as to have a restriction on whether broken code can even be committed.


Flater 05/22/2018.

Phill W's answer is great. I can't replace it.

However, I do want to focus on another part that may have been part of the confusion.

In some organisations, apparently, part of the software release process is to use unit testing, but at any point in time all unit tests must pass

"at any point in time" is overstating your case. What's important is that unit tests pass after a certain change has been implemented, before you start implementing another change.
This is how you keep track of which change caused a bug to arise. If the unit tests started failing after implementing change 25 but before implementing change 26, then you know that change 25 caused the bug.

During the implementation of a change, of course the unit tests could fail; tat very much depends on how big the change is. If I'm redeveloping a core feature, which is more than just a minor tweak, I'm likely going to break the tests for a while until I finish implementing my new version of the logic.


This can create conflicts as to team rules. I actually encountered this a few weeks ago:

  • Every commit/push causes a build. The build must never fail (if it does or any test fails, the committing developer is blamed).
  • Every developer is expected to push their changes (even if incomplete) at the end of the day, so the team leads can code review in the morning.

Either rule would be fine. But both rules cannot work together. If I am assigned a major change that takes several days to complete, I wouldn't be able to adhere to both rules at the same time. Unless I would comment my changes every day and only commit them uncommented after everything was done; which is just nonsensical work.

In this scenario, the issue here isn't that unit tests have no purpose; it's that the company has unrealistic expectations. Their arbitrary ruleset does not cover all cases, and failure to adhere to the rules is blindly regarded as developer failure rather than a rule failure (which it is, in my case).


user7294900 05/22/2018.

It promotes the idea that code should be perfect and no bugs should exist - which in the real world is surely impossible for a program of any size.

Not true. why do you think it's impossible? here example for program that it works:

public class MyProgram {
  public boolean alwaysTrue() {
    return true;
  }

  @Test
  public void testAlwaysTrue() {
    assert(alwaysTrue() == true);
  }
}

It is a disincentive to think up unit tests that will fail. Or certainly come up with unit tests that would be tricky to fix.

In that case it may not be unit test, but integration test if it's complicated

If at any point in time all unit tests pass, then there is no big picture of the state of the software at any point in time. There is no roadmap/goal.

true, it's called unit test for a reason, it check a small unit of code.

It deters writing unit tests up-front - before the implementation.

Developers will deter writing any tests by their nature (unless they came from QA)


Frax 05/22/2018.

There are many great answers, but I'd like to add another angle that I believe is not yet well covered: what exactly is the point of having tests.

Unit tests aren't there to check that your code is bug free.

I think this is the main misconception. If this was their role, you'd indeed expect to have failing tests all over the place. But instead,

Unit tests check that your code does what you think it does.

In extreme cases it may include checking that known bugs are not fixed. The point is to have control over your codebase and avoid accidental changes. When you make a change it is fine and actually expected to break some tests - you are changing the behavior of the code. The freshly broken test are now a fine trail of what you changed. Check that all the breakages conform to what you want from your change. If so, just update the tests and go on. If not - well, your new code is definitely buggy, go back and fix it before submitting!

Now, all the above work only if all tests are green, giving a strong positive results: this is exactly how the code works. Red tests don't have that property. "This is what this code doesn't do" is rarely a useful information.

Acceptance tests may be what you are looking for.

There is such thing as acceptance testing. You may write a set of tests that have to be fulfilled to call the next milestone. These are ok to be red, because that's what they were designed for. But they are very different thing from unit tests and neither can nor should replace them.


jk. 05/22/2018.

If you don't fix all unit tests you can rapidly get into the state where nobody fixes any broken tests.

  1. Is incorrect as passing unit tests don't show the code is perfect

  2. It's a disincentive to come up with code that would be difficult to test too, which is good from a design point of view

  3. Code coverage can help there (though it's not a panacea). Also unit tests are just one aspect of testing - you want integration/acceptance tests too.


Graham 05/22/2018.

To add a few points to the already-good answers...

but at any point in time all unit tests must pass

This shows a lack of understanding of a release process. A test failure may indicate a planned feature under TDD which isn't yet implemented; or it may indicate a known issue which has a fix planned for a future release; or it may simply be something where management have decided this isn't important enough to fix because customers are unlikely to notice. The key thing all these share is that management have made a judgement call about the failure.

It promotes the idea that code should be perfect and no bugs should exist - which in the real world is surely impossible for a program of any size.

Other answers have covered the limits of testing.

I don't understand why you think eliminating bugs is a downside though. If you don't want to deliver code which you've checked (to the best of your ability) does what it's supposed to, why are you even working in software?

If at any point in time all unit tests pass, then there is no big picture of the state of the software at any point in time. There is no roadmap/goal.

Why must there be a roadmap?

Unit tests initially check that functionality works, but then (as regression tests) check that you haven't inadvertently broken anything. For all the features with existing unit tests, there is no roadmap. Every feature is known to work (within the limits of testing). If that code is finished, it has no roadmap because there is no need for more work on it.

As professional engineers, we need to avoid the trap of gold-plating. Hobbyists can afford to waste time tinkering round the edges with something that works. As professionals, we need to deliver a product. That means we get something working, verify that it's working, and move on to the next job.


Joel Coehoorn 05/22/2018.

Here is the underlying logical fallacy:

If it is good when all tests pass, then it must be bad if any tests fail.

With unit tests, it IS good when all the tests pass. It is ALSO GOOD when a test fails. The two need not be in opposition.

A failing test is a problem that was caught by your tooling before it reached a user. It's an opportunity to fix a mistake before it is published. And that's a good thing.

Related questions

Hot questions

Language

Popular Tags