Thursday 11 December 2008

Testability - re-discovering what we learned and forgot about software development.

(or, why agile approaches require good old-fashioned O-O)

What are we all talking about? (the intro)

Testability comes out of an attempt to understand how agile processes and practices change how we write software. Misko Hevery has written some rather wonderful stuff on his blog, and starts to get into issues around singletons, dependencies, and other software bits that get in the way of testability, and starts to look at testability as an attribute. (Full plug at the end of this post) But in particular, he also starts looking at what design and process changes can we start to use to make code more testable. And while we're at it, what is the point? Is testability the point? It's important, especially as a way to remove barriers from working in an agile environment, if that's what we've chosen. There are reasons related to quality as well. But I think there are some deeper implications, which Misko and others have implied and, on occasion, called out. It's that we've forgotten the point of Object-Orientation and what it was trying to achieve in the 80's and 90's, and are re-discovering it.

What have we forgotten? (the reminiscence)

But what is the essence of what Misko is saying? Martin Fowler (coiner of the Inversion of Control term) and others have written wonderful articles on "Law of Demeter" and other principles. In general, they are all looking at how we grow software, and between all the thinkers and talkers and doers, it seems to me we're re-discovering some key concepts that we all learned in college but forgot in the field. The key points are:

  1. Manage complexity by separating concerns and de-coupling code.
  2. Map your solution to the business problem.
  3. Write code that is not brittle with respect to change.
  4. Use tools that empower your goals, don't change your goals to fit the limits of the tools.

In other words, what we all learned when we were taught textbook Object-Oriented Analysis and Design and Programming. Now, O-O has had its various incarnations, and I would contend that all the architectural threads of AOP, Dependency Injection, as well as a more conservative take on O-O have all stemmed from these key principles which were at the core of the Smalltalk revolution and early attempts to get rapid development cycles, and what is often now called "agile" development.

How did we forget all this? (the rant)

So why did we forget all this? Five reasons, I suspect:

Selling Object-Orientation

We did a crappy job of selling O-O. You might not think so, since O-O is so prevalent (or at least O-O languages are). However, we didn't sell the 4 notions I mentioned above, we sold business benefits that were, in essence, lies. They didn't need to be, but usually were. These are things like "O-O will make you go faster because of reuse," or "O-O will help you reduce costs because of reuse," and on and on. These can be true, but are usually the result of a longer evolution of your software in an O-O context, and the costs of realizing the benefits that we sold often would be too high for businesses to stomach. In fact, O-O won, in my view, because managing complexity became fundamentally necessary when software scale became huge.

Cheap, fast computers

Fast computers have allowed us to do so many bad things. Room to move and space to breath unfortunately gave us less necessity for discipline. We removed the impetus to be efficient and crisp and to think through the implications of our decisions in software. However, we're catching up with hardware in terms of real limits. Moores law may still apply, but we are starting to have limits in memory. A client of mine observed that Google is an interesting example. He works for an embedded software firm, and noted that they have probably similar scaling issues as an embedded (say, phones or similar devices) device company, because sheer volume of traffic forces Google against real limits, much the way resource constraints on a telephone forces those companies against their constraints. However most of us live in a client-server mid-level-traffic dream of cheap hardware so that we can always "throw more hardware at it".

Java

Java is an O-O language, and really was the spear-head that won the wars between O-O and structured programming in the '90s. However, bloated processes and territorialism have kept Java from fixing some of its early issues that prevented it from solving problems such as I mention above in efficient ways. The simple example is reflection. If a langauge requires that I create tons of boilerplate code (try-catch, look-up this, materialize that) to find out if an object implements a method, and then to invoke it, it needs to provide a way for me to eliminate the boilerplate. If it can't do it in a clean way, it should at least provide convenience APIs for me to do the most common operations without all that boilerplate. Sadly, the core libraries of Java were bloated even in the beginning, because of the tension between the Smalltalk, Objective-C people on one hand, the C++ people on the others, and Sun not caring really, because they were a hardware company. So because Java won the O-O war (don't argue, I'm generalizing) its flaws became endemic to our adoption of O-O practices. I'm going to mention that J2EE bears about half of Java's responsibility, but I'll leave that for another flame. Nevertheless, the design and coding and idiomatic culture that spawned from these toolsets have informed our approaches to O-O principles for over a decade.

The .com bubble

The dot-com bubble compounded our Java woes by introducing 6-month diploma programmers into the wild - nay, into senior development positions, and elevated UI scripting a-la JSP and ASP, which allowed for the enmeshment of concerns beyond anything we'd seen for a while in computing. All notions of Model-View-Control separation (or Presentation, Abstraction, Control) were jettisoned while millions of lines of .jsp and .asp (and ultimately .php) script were foisted onto production servers, there to be maintained for decades (I weep for our children). While this was invisible in early small internet sites, the Dot-Com bubble which careened the internet into a primary vehicle for business, entertainment, culture, and these days even politics caused a growth in number, interaction, and complexity of these sites that has caused unmitigated hell for those who found their "whipped-up" scripted sites turn into high-traffic internet hubs. Much of this code has been re-written out of necessity, and yet it caused the travesty that is Model-1 MVC and other attempts to back-into good O-O practice from a messy start. These partial solutions were propagated as good practice (which, by comparison with the norm, they were) and a generation of students learned how to do O-O from Struts and other toolsets. Ignored in that process were wonderful tools like WebObjects or Tapestry which actually did a fair job of doing O-O AND doing the web, but I'll leave that point here.

Design Patterns

A small corollary to the dot-com bubble is that combining Java, and Patterns concepts from the Gang of Four, these new developers managed to create a code-by-numbers style of design, where you don't describe architecture with patterns, you design with patterns up-front. This has led to some of the worst architecture I've ever seen. Paint-by-numbers has never resulted in a Picasso or Monet, and rarely results in anything anyone would want to see except the artist's mother. Design Patterns and pattern-languages aren't bad - far from it. However, they are a language to discuss architecture, they are not an instruction manual. They should be symptoms of a good software design, not an ingredient.

Really big, bloated projects

Lastly, really really big projects have taken all of the above and raised the stakes. We are now finding that the limits of software aren't the Hardware (thank you Moore), but rather the people. A whole generation of us attempted to solve this by increasing the process weight around the development effort. This satisfied some contractual issues with scale, but in general failed to attend to the issues raised in The Mythical Man-Month, despite 40 years of its having been published.

A side-effect of really big projects is that when you have that much money on the table, risk-mitigate goes into high-gear, and people are bad at risk analysis and planning. We tend to manage risk by telling ourselves stories. We invent narratives that help us manage our fears, but don't actually manage risk. So we make very large plans. Idealistic (even if they're pessimistic) portrayals of how the project shall be. Then, because we want to "lock down" our risk, we solicit every possible feature that could potentially be in, including, but not limited to, the kitchen sink, to make sure we haven't forgotten it. It goes in the plan, but by this point we have twice the features any user will ever ever use and 80% of the features provide, maybe, 20% of the value. So we actually increase risk to the project's success while we are trying to minimize and control it. This kitchen-sinkism leads to greater and larger projects, but then large projects bring prestige as well, so there are several motivation vectors for large-projects. Most of them aren't good.

Enter Agile (the path to the solution)

Agile software started to address the human problem of software, and I won't go into it much here, as it's well covered elsewhere. However, one can summarize most agile methods by saying that the basics are

  1. Iterate in small cycles
  2. Get frequent feedback (ideally by having teams and customers co-located)
  3. Deliver after each iteration (where possible)
  4. Only work on the most important thing at a time.
  5. Build quality in.
  6. Don't work in "phases" (design, define, code, test)

This is a quick-n-dirty, so no arguments here. It's just an overview. But from these changes there are tons of obstacles, issues, and implications. They are, indeed, too numerous to go into. But a light non-exhaustive summary might include:

  • You can't go fast unless you build quality in
  • You can't build quality in unless you can test quickly
  • You can't test quickly if you can't build quickly
  • You can't test quickly if you aren't separating your types of testing
  • You can't test quickly if your tests are manual
  • You can't automate your tests if your code is hard to test (requires lots of setup-teardown for each test)
  • You can't make your code more amenable to testing if it's not modular
  • You can't ship frequently if you can't verify quickly before shipping
  • You can't build quality in if you ship crap
  • You can't get feedback if you can't ship to customers
  • etc.

Lots of "can't" phrases there, but note that they're conditionals. Agile methods don't actually fix these problems, they expose them, and help you do root-cause analysis to solve them. For example, look at some of those chains there.

If I, for example, take my "Big Ball of Mud" software system and re-tool it to de-couple it's logical systems and components into discrete components in its native language (say, Java), then I suddenly can test it more helpfully, because I can test one component without interference from another. Because of this, my burden of infrastructure to get the same test value goes down. Because of this my speed of test execution improves. This causes me to be able to test more frequently (possibly eventually after each check-in). This causes me to be able to make quick changes without as much fear, because I have a fast way of checking for regressions. This allows me to then be less fearful of making changes, such as cleaning up my code-base... Oh wow - there's a circle.

In fact, it is a positive feedback loop. Starting to make this change enables me to more easily make the change in the future. But once I'm moving along in this way, I start to be able to ship more frequently, because my fast verification reduces the cost of shipping. This means I could ship after three iterations, instead of twelve... or eventually every iteration. It means I can make smaller iterations, because the cost of my end-of-iteration process is going down... There are several feedback loops in process during any transition to a more agile way of doing things, as the agile approach finds more and more procedural obstacles in the organization.

But... and here's the big but... if you start to do an agile process implementation and don't start changing how you think about software, software delivery, how you write it, and how it's designed, you're going to run up against an internal, self-inflicted limitation. You can't move fast unless you're organized to accommodate moving fast. Your code base is part of your environment in this context. So starting to help developers subtly move in this direction, and increase the pace at which they transition the existing code-base into a more suitable shape for working efficiently is critical. This, as it turns out, involves our dear old O-O.

What are testability, O-O, and other best practices today? (the recipies)

There's a wealth of info out there. These don't just include software approaches but also team practices. Martin Fowler, Misko Hevery, Kent Beck, (Uncle) Bob C. Martin, Arlo Belshee, and a host of others I couldn't name in this space provide lots of good text on these. These include dependency-injection, continuous code review (pair programming), team co-location, separation of concerns. On the latter point, Aspect Oriented Programming is a nice approach which I see as another flavour of O-O, conceptually, in that it attempts to get at some of the same key problems. It is often mixed either with O-O, or with Inversion of Control containers. Fearless refactoring, continuous integration, build and test automation (I'm a big fan of Maven, btw, since, for all its problems, it makes dependency explicit). Test Driven Development (and it's cousin test-first development). Also, the use of Domain Specific Languages has become quite helpful in both mapping the business problem to technology, but also eliminating quality problems by defining the language of the solution differently. And of course, wrapping this development in a management method that helps feed the process and consume its results - such as Scrum, or the management elements of Extreme Programming.

These are a sampling of practices that affect how you organize, re-think, design, create, and evolve your software. They rely on the basic principles and premises of agile, but require, in implementation, the core elements that O-O was trying to solve. How can we manage complexity, address the business problem, write healthy code, and be served, not mastered, by our tools.

Prologue (the plugs)

I'm a big Misko Hevery fan these days (I can feel him cringing at that appellation). There's a lot I've had to say to my clients on the subject of testable code, designing for testability, and tools and technologies, but Misko seems to have wonderfully summed up much of my discussion on the topic on his "Testability Explorer" blog. He explains issues like the problem with Gang-of-Four Singletons, why Dependency Injection encourages not only testable code, but it does so by separating concerns (wiring, initialization, and business logic), and all sorts of good stuff like that. It helps to read from the earlier materials first, because Misko does build on earlier postings so later ones may assume you have read the earlier ones and that you are following along. Notwithstanding, his posts are cogent, clear, and insightful and have helped to crystallize certain understandings that I've been formulating over my years of software development into much more precise notions, and he's helped me learn how to articulate and explain such topics.

Misko has also recently published a code-reviewer's guide to testable code. Sorely needed in my view. I also want to make a quick shout-out to his testability explorer tool, which is fabulous, and I'm working on a Maven 2 plugin to integrate it into site reports.

Also, I've built a Dependency Injection container (google-code project here and docs here) suitable for use on Java2 Micro Edition CLDC 1.1 platform, because I had to prove to a client that you could do this in a reflection-free embedded environment. It's BSD licensed, so feel free to use it if you want.

Lastly, Mishkin Berteig and co. have a decent blog called Agile Advice (on which I occasionally also blog) which nicely examines the various process-related, cultural, organizational, and relational issues that working in this way brings up. My posts tend towards the technical on that blog, but occasionally otherwise as well.


3 comments:

RowliRowl said...

Documentation. Documentation. Documentation. Documentation. Whatever you do: documentation. It is true very few will read it, but those that do REALLY REALLY REALLY need it.

Document everything: your patterns, why you chose them, how to use them, how to scale them. Document what each class/object does. Document all bit twiddling. Make constants and document them: magic numbers are evil. Document interfaces between various components, especially hardware components.

The difference between a feature and a bug is intent. If you don't know the intent, how can you know if it is a feature or a bug?

Testing doesn't always help, as the test itself could be buggy or ambiguous. Or inaccessible. Write it down. In. Plain. Language.

Anonymous said...

Great article! I, too, have noticed some sort of OO renaissance in the past few months.

In the recipes section, you left out someone who I consider to be the most convinced and fervent proponent of good OO design: H. S. Lahman.

For those interested, his blog-ish site is at http://pathfinderpeople.blogs.com/

Eddy.

Unknown said...

I wonder if I agree about the documentation comment. It's not that I disagree with the underlying principle - "communicate your intent." However, I find that often the documentation is overly verbose and actually leads to difficulty in understanding. I like to balance things by making my code relatively granular and component-oriented, with explicit dependencies between components where the very names of the dependency make the collaboration obvious.

This has the advantage of making a lot of interactions and collaborations auto-generatable from the code (using reporting tools such as maven's site reports), because the intent is clear in the very structure and naming.

However, having said that, I usually also include a lot of usage examples, key entry-points, and any non-obvious interactions in documentation. But for the latter I often find that if I have to document it extensively, then I haven't broken it into the right sized pieces. It's not dogmatic, but I find that's typically the case.

So yes, document, and in plain English with code samples, but also make sure that as much as possible, the very structure and style of naming self-documents so you require less duplication of information. That way you can avoid the risk of stale documentation.