Tuesday, 4 November 2008

code coverage != test driven development

I was vaguely annoyed to see this blog article featured in JavaLobby's recent mailout. Not because Kevin Pang doesn't make some good points about the limits of code coverage, but because his title is needlessly controversial. And, because JavaLobby is engaging in some agile-baiting by publishing it without some editorial restraint.

In asking the question, "Is code coverage all that useful," he asserts at the beginning of his article that Test Driven Development (TDD) proponents "often tend to push code coverage as a useful metric for gauging how well tested an application is." This statement is true, but the remainder of the blog post takes apart code coverage as a valid "one true metric," a claim that TDD proponents don't make, except in Kevin's interpretation.

He further asserts that "100% code coverage has long been the ultimate goal of testing fanatics." This isn't true. High code coverage is a desired attribute of a well tested system, but the goal is to have a fully and sufficiently tested system. Code coverage is indicative, but not proof, of a well-tested system. How do I mean that? Any system whose authors have taken the time to sufficiently test it such that it gets > 95% code coverage is likely (in my experience) thinking through how to test their system in order to fully express its happy paths, edge cases, etc. However, the code coverage here is a symptom, not a cause, of a well-tested system. And the metric can be gamed. Actually, when imposed as a management quality criterion, it usually is gamed. Good metrics should confirm a result obtained by other means, or provide leading indicators. Few numeric measurements are subtle enough to really drive system development.

Having said that, I have used code-coverage in this way, but in context, as I'll mention later in this post.

Kevin provides example code similar to the following:

String foo(boolean condition) {
   if (condition)
       return "true";
       return "false";

... and talks about how if the unit tests are only testing the true path, then this is only working on 50% coverage. Good so far. But then he goes on to express that "code coverage only tells us what was executed by our unit tests, not what executed correctly." He is carefully telling us that a unit test executing a line doesn't guarantee that the line is working as intended. Um... that's obvious. And if the tests didn't pass correctly, then the line should not be considered covered. It seems there are some unclear assumptions on how testing needs to work, so let me get some assertions out of the way...

  1. Code coverage is only meaningful in the context of well-written tests. It doesn't save you from crappy tests.
  2. Code coverage should only be measured on a line/branch if the covering tests are passing.
  3. Code coverage suggests insufficiency, but doesn't guarantee sufficiency.
  4. Test-driven code will likely have the symptom of nearly perfect coverage.
  5. Test-driven code will be sufficiently tested, because the author wrote all the tests that form, in full, the requirements/spec of that code.
  6. Perfectly covered code will not necessarily be sufficiently tested.

What I'm driving at is that Kevin is arguing against something entirely different than that which TDD proponents argue. He's arguing against a common misunderstanding of how TDD works. On point 1 he and I are in agreement. Many of his commentators mention #3 (and he states it in various ways himself). His description of what code coverage doesn't give you is absurd when you take #2 into account (we assume that a line of covered code is only covered if the covering test is passing). But most importantly - "TDD proponents" would, in my experience, find this whole line of explanation rather irrelevant, as it is an argument against code-coverage as a single metric for code quality, and they would attempt to achieve code quality through thoroughness of testing by driving the development through tests. TDD is a design methodology, not a testing methodology. You just get tests as side-effect artifacts of the approach. Useful in their own right? Sure, but it's only sort of the point. It isn't just writing the tests-first.

In other words - TDD implies high or perfect coverage. But the inverse is not necessarily true.

How do you achieve thoroughness by driving your development with tests? You imagine the functionality you need next (your next increment of useful change), and you write or modify your tests to "require" the new piece of functionality. They you write it, then you go green. Code coverage doesn't enter into it, because you should have near perfect coverage at all times by implication, because every new piece of functionality you develop is preceded by tests which test its main paths and error states, upper and lower bounds, etc. Code coverage in this model is a great way to notice that you screwed up and missed something, but nothing else.

So, is code-coverage useful? Heck yeah! I've used coverage to discover lots of waste in my system. I've removed whole sets of APIs that were "just in case I need them" APIs, because they become rote (lots of accessors/mutators that are not called in normal operations). Is code coverage the only way I would find them? No. If I'm dealing with a system that wasn't driven with tests, or was poorly tested in general, I may use coverage as a quick health meter, but probably not. Going from zero to 90% on legacy code is likely to be less valuable than just re-writing whole subsystems using TDD... and often more expensive.

Regardless, while Kevin is formally asking "is code coverage useful?" he's really asking (rhetorically) is it reasonable to worship code coverage as the primary metric. But if no one's asserting the positive, why is he questioning it? He may be dealing with a lot of people with misunderstandings of how TDD works. He could be dealing with metrics bigots. He could be dealing with management-imposed-metrics initiatives which often fail. It might be a pet peeve or he's annoyed with TDD and this is a great way to do some agile-baiting of his own. I don't know him, so I can't say. His comments seem reasonable, so I assume no ill intent. But the answer to his rhetorical question is "yes, but in context." Not surprising, since most rhetorically asked questions are answerable in this fashion. Hopefully it's a bit clearer where it's useful (and where/how) it's not.

1 comment:

Kimota94 aka Matt aka AgileMan said...

One of the things that I liked about the fact that some developer/tester types in my organization were getting excited about code coverage tracking is that it indicated that they were now thinking about code quality! I agree that stats of that sort, when mandated down from On High by management, can often lead to gaming, even of the subconscious variety. But when interest in measuring code coverage (for example) develops at the grass roots level, as something that a development team cares about on its own, I think it's more likely to be a sign of a good mindset happening among "the peeps."

I had a similar sort of response to the attitude shown by some teams in their quest to reduce "bug debt" (the quantity of known bugs that have gone unaddressed to date). If management is the only one saying that reducing that figure is important - and possibly even keying performance results/bonuses off of it - then team members are more likely to perceive it as an end in itself, whereby the means of achieving it are essentially left up to the imaginative (and boy, can people use their imaginations)!

If focus is placed on it by the team itself, though, then it's more of a means to an end, with the metric itself simply providing data toward the determination of how close the actual goal - high quality software - is to reality. In that context, it's a wonderful thing, and can lead to all kinds of amazing developments.

Having the metrics being driven by the "doers" instead of the "talkers" (sorry, fellow managers!) can make all the difference, I think.