Friday, 2 January 2009

Why I hate the java.util collection library.


It's very simple. While it was a vast improvement over Vector and Dictionary of their day, the Collections library as of Java 1.2 did what all new Java APIs from Sun seem to do... require lots of code to use simply and allow the user to do invalid things.

I won't spend a lot of time on the former, since it could be the topic of a whole other blog post, and I should preface all of this by saying that Java is my most proficient language. So this is not an anti-Java bigot speaking... just a frustrated user who wishes Sun and the community wouldn't keep heaping bad APIs on top of bad APIs.

Sorry... back to the point.

Immutability - it's all backwards... what's with that?

The big issue I have with the Collections API are about allowing the user to do wrong things. This amounts to Sun having inverted Mutability vs. Immutability in the class heirarchy. Immutability, in any language that wants to guarantee semantics, ease concurrency and resource contention, and otherwise clean things up should have immutability as a default. Something set shouldn't be volatile or mutable unless specified as such, and the semantics should enforce this. But with the collections API, we have immutability as an optional characteristic of Collections. To use a simplistic example, you can do this:

Collection c = new ArrayList();

Collection has an add() method. This means that, by definition, Collection is mutable. "But wait!" you cry, you can obtain an immutable version of this collection by calling Collections.immutableCollection(c);. Sure. At which point you have something that conforms to the contract of Collection, but which may throw exceptions if part of that contract is relied upon. In other words, you should not have access to methods that allow you to break the contract. To allow this is to have written a bad contract with ambiguity. Now, to properly guard against the possibility of stray immutable collections, you may be forced to check immutability before executing the contract (the add() method) or you may have to guard against the exceptions with try-catch logic and exception handling. You can see some of this in the concurrent library's implementations of concurrent collections. It's not bad, but could be simpler with an immutable collection interface.

Additionally, consider how hard it is to create anonymous one-off implementations of Collection. You have to implement not only size, contains(), iterator(), but all the storage logic. If you're wrapping an existing data structure and merely wanted a read-only view on the structure, you are forced to implement all that extra API in your code purely to satisfy the optional contract provided in the Collection definition.

The point here is that an immutable Collection is a sub-set of the functionality of a MutableCollection. That should be obvious, but apparently wasn't to Sun. Consider had Sun used the model used in NeXTSTEP (and now Apple's Cocoa) APIs. Collection would have been defined as (simplified):

public interface Collection<T> extends Iterable<T> {
    public Iterator<T> getIterator();
    public int getSize();
    public boolean isEmpty();
    public boolean contains(T object);
    public boolean containsAll(Collection<T> object);
    public T[] toArray(T[]);


public interface MutableCollection<T> extends Collection<T> {
    public boolean add(T object);
    public boolean addAll(Collection<T> objects);
    public boolean remove(T object);
    public boolean removeAll(Collection<T> objects);
    public boolean retainAll(Collection<T> objects);
    public void clear();

This would, ultimately, mean that a Collection instance, typed as a Collection would not have any mutable methods available to invoke, let alone that would need guarding against stray immutable invocations. There would then be two strategy for guaranteeing immutability. One... cast the stupid thing as Collection, and onothing that has access to the cast can get at the MutableCollections methods (except by explicit reflection). Alternately, add a "getImmutableCopy()" method to the MutableCollection interface that creates a shallow copy that is NOT an implementor of MutableCollection... merely of Collection. Then you have a safe "snapshot" of the mutable object that can be freely passed around without worry that something else will modify it.

Ok, why is this such a big deal? It's about having code that means what it says. If I have to guess about the run-time state, or more concrete type of an instance to know if it's OK to invoke one of its methods or not, then I'm working with implicit contract, and that's murky territory. Java, by making an ImmutableCollection a special implementation of Collection, has inverted the hierarchy of contract, and exposed methods that are not truly available for all implementors. Optional interfaces are fine, but you don't expose the optional interface above where it's true. It's a basic piece of encapsulation that the Java folks just seemed to forget.

Now, this is a decade too late, this little rant. Truth is, I made it when I worked at Sun, but was quite the junior contracting peon, and had no real voice. Now, I have a blog, and am free to whine and be annoyed in public. :) But I hope to make a more general point here about contract. Optional contracts (APIs) need to be handled very carefully, and in a way such that an unfamiliar programmer can understand what you meant from how the contract reads. Look at your interfaces from the perspective that it should not offer what it (potentially) cannot satisfy. Polymorphism doesn't require "kitchen-sinkism" in an API. Just careful, thought-out layers.

The issues of testability also arise here, insofar as a class that has to implement optional APIs must, therefore, have more tests to satisfy what are, in essence, boilerplate code. If I made a quick-and-dirty implementation of Collection as a wrapper around an already existing, immutable data structure, then I have to implement all of those methods and test them, when in fact, half of them will throw an exception. This means (to me) that they probably shouldn't even exist. Code with a lot of boilerplate code (lots of extra getters and setters, or over-wrought interfaces) tend to be hard to test, and one of my big annoyances in life (these days) is testing boilerplate code. Ick.



Travis said...

If they did it your way, then a Map could be a Collection too.

Christian Gruber said...

True. Map could implement Collection<MapEntry<K,V>>. Interesting. I always was annoyed that the base interfaces weren't harmonized.

Actually, a simple way to at least start to get this working is to create (in Java 1.7 of course) an ImmutableCollection interface with the read-only methods, and have Collection inherit from it. It sucks from a naming perspective, but you could at least keep things backwards compatible and start down a better path.

As to map, I'd also love to see a for() syntax that allowed for Map support.

Map<String,Foo> myMap = ...something;
for (String k, Foo v : myMap) {