Geek in a Suit

Tuesday, August 20, 2013

Wow, I suck at blogging.

I just discovered, because of a bug filed on the Guice project, that my blog DNS settings were pointing at a domain parking page. Whoops! When I made the transition from Canadia to Amurika this year I totally neglected to fix up some domain settings. My bad.

But it sort of highlights that I don't have a good discipline around blogging. Examining this, I see about ten draft posts in my blogger account, that I never got around to posting, and which are now sort of obsolete and irrelevant. I'm not so narcissistic to think that everyone cares what I have to say but I provide zero value by never writing at all. :( So, sorry for that. I look with irony at my post of a few years ago stating that "I'm totally gonna blog again, now, promise!" Apparently not.

Life has been crazy - being at Google is a whirlwind. It's exciting, stressful, but also charming. You get very much "dug in" in certain ways, but most of those ways aren't awful - they just occupy your attention.

What IS wonderful is that I've been able to work on primarily open-sourced projects, being a part of the the java core-libraries team. This has meant working on the google core libraries offering Guava, dependency-injection frameworks such as Guice, and Dagger, as well as a host of smaller projects such as Auto (let robots write it!) and contributing to my little testing/proposition framework, Truth. Seeing these things evolve, sometimes in response to each other, has been wonderful. And I get paid to do it! Why? Not because Google is purely altruistic, though Googlers seem to have a really strong bent towards contributing back. But these things really help Google develop world-class software at-scale.

I was in a little internal un-conference of core-librarians of the various supported languages, and my boss pointed out that the fact that we HAVE core libraries and tooling efforts like this is a major contributor to Google's competitiveness and capability. We fund people to figure out what patterns people use and what code developers re-write on every project and creating hardened, efficient, common implementations/APIs/frameworks out of them, where appropriate. We don't try to re-use everything, but we dig and see where we can "maximize the productivity of labour" (to borrow from the economists) of our colleagues by reducing their coding and testing burden to focus on the things that make their application unique from others. In short, we invest in future production, in future capacity for our developers, both in quality and velocity.

Often, we aren't writing tons of code to do it, but rather examining patterns and tweaking, deprecating certain approaches in favor of others, and using the rather stunning tooling we've evolved (blogged about elsewhere) and tools we've open-sourced, to migrate our entire code-base from deprecated forms to new ones. But we also consider new approaches, libraries, and frameworks, both developed internally and externally. It's actually remarkable (to me) that a company this big can change direction so quickly, and adapt to new realities so quickly. The joke among my team is that we're starting to be less a core-libraries team, and more of a javac-enhancement team, since we are also doing a lot of building in static analysis and checks (thanks error-prone folks) into our tooling to prevent error at compile time as we are building new frameworks and tools.

While we've had a few false starts here and there, we are increasingly engaging in joint projects and accepting contributions into the codebase from external parties who benefit from the open-source work as well, which is gratifying. Nothing quite so happy as win-win exchanges.

All told, it's been a couple of years of full engagement, and not a lot of time to do tech blogging. But I'll give it another go, even if it's just to do updates like this from time to time. It's the best job I've had to date, and I am thrilled to be in contact with such high-quality professionals as I am on a daily basis.

Labels: , , , , , , , , , ,

Thursday, July 14, 2011

Five years out, and still stable. (Or how I installed new git on old unix)

So, I have an OpenBSD 3.9 machine. It's circa 2006 since I last updated it. Lame, I know, but it's stable, doesn't really do much, has few problems and security holes, and runs nearly no services. But I store some files on it and have been using SVN to manage them. I wanted to start using Git, since I use it more consistently these days. I thought about Mercurial, but - meh - I've stopped fighting religious wars, even with myself.

Problem is - Git, even an old 1.4 version, doesn't exist in the ports tree of OpenBSD until 2007. So, what am I to do? I could update my copy of OpenBSD - an option. A good one that, in the long run, I really should do, even just for security fixes. I could bump the ports tree up a couple of versions until the git port exists, and then try to build it and hope the toolchain works. There are a lot of moving parts to the ports infrastructure, and they evolve between releases of OpenBSD. I played with that for a minute, and in the end, decided to go for broke, and do the naive thing that shouldn't work.

I un-tarred the latest Git 1.7 that's part of the latest OpenBSD release, into /usr/local, deleted the +DESC and +COMMENT files, and basically ran it to see what broke.

Well... it couldn't link to libc.so.58.0 - fair, since I only had libc.so.39.0. So I symbolically linked it. Did the same for libcrypto and libiconv. Seriously. Just symlinked them, hoping no symbols were used by git that had been added or changed in more recent versions.

Worked like a charm.

OpenBSD, and the various core libraries have been stable enough that half-again as many version bumps in libc haven't changed it enough that git needed it - likewise libcrypto. Kudos! I know I lucked out, but still - impressive.

Wednesday, July 21, 2010

Singleton and Request scope considered harmful? No! Scopelessness considered harmful.

I watched an e-mail thread discuss how Singleton and Request scope in Guice were harmful, because Singletons can only depend on Singletons (without magic) and that Request Scoped items can only depend on Request Scoped and Singletons... followed by the idea that because "we're stateless" with minimal global state, so we shouldn't need global things.  And because it's cheap to build things so we can build them all the time.

I feel that this perspective comes from a lot of fighting with scopes and scoping problems and probably is an honest attempt at resolving them - if you're stateless, then scope IS somewhat irrelevant... but it also turns out that if you're not stateless (by accident) and you go scope-less, then you have state accidentally scattered all around your object graph without realizing you've copied it.  Leave aside entirely that you're going to thrash garbage collection because your dependencies are stored in ivars which will go on the heap...

So what should happen here?

Other than Guice, all other dependency-injection containers of any substantial distribution use "Singleton" as the default scope, not "scopeless" (or what spring calls prototype scope).  This is because Singleton is actually cleaner and easier to understand.  There is one copy of this in the system... so it needs to either be stateless or carefully guard its state for multi-threaded access.  But also the lifecycle is clearer - app starts up, component starts up, component shuts down, app shuts down.  Scopeless (Guice "default" scope) actually have an arbitrary lifecycle based on who asks for them.  

If you have Foo -> Bar (where ->  means depends-on), and Foo is any real scope (singleton, session, request, per-thread, whatever), but Bar is scopeless (meaning a new one is created on every demand), then Bar's lifecycle is different if Foo depends on it than if Bash depends on it because it attaches to the lifecycle (lifetime... or scope, if you will <ahem>) of the dependent component.

This is freaky because it means it's sort of indeterminate until used (I call it the Heisenberg scope).  And each time it's used it could be different.

Again, if it's stateless, no problem... but if it's stateless, Singleton is no problem... and cleaner... because you'll uncover scoping problems more quickly with more restrictive scope policies.  But moving everything to no-scope means no clear lifecycle... or rather, whatever lifecycle and lifetime you happen to have in whatever calls it. 

I think people look at scope as "magical" - especially in the Guice community. I don't see this kind of thrash in Picocontainer, Tapestry-IOC, or Spring user communities.  And I think it's because "prototype" scope (scopeless) is seen as a quasi-factory behaviour, not a dependency/collaborator injection behaviour.  The two are subtly different, and I think the distinction is lost in the Guice user community.  I have ideas as to why this distinction arose between the Guice community's thinking and others', but I'll leave that for another post.  

The point is, Scope implies a validity within a lifetime, and if something is stateless, there's no reason it shouldn't be the one and only copy with a lifetime of the entire application's lifetime. I've long posited that "games with scopes" is a dangerous thing in dependency injection, but this is solving the problem by dropping a 16 ton weight on your own head.  It uses the most magical quasi-scope to create a fan of instances where a very small set of collaborators are required.  

I'm getting close to believing that Fowler, Martin, and others were wrong, and that Dependency Injection (heck, occasionally O-O) are just too dangerous. Seriously.  I can't imagine not using them, personally, but I find so many teams and projects where they just think about the problem so unhelpfully and then their clean-up efforts are worse than the mess they created. <sigh>

 

Labels: , , , , , , , , , , , , ,

Thursday, February 11, 2010

Shu-Ha-Ri not harmful... it's misunderstood and mis-applied.

Rachel Davies, for whom I have incredible respect, posted this post called "Shu-Ha-Ri considered harmful". In it she points out that the basic notion of Shu-Ha-Ri from Aikido of graduated styles of learning - the novice, the advanced student, the master should learn in different ways, the novice learning more by rote, and more importantly, from a single master, the advanced student trying the techniques in varying ways and comparing styles, then the master innovating new combinations and techniques. Cockburn, often cited in the Agile community somewhat gets this, but his adaptation varies from this concept just a bit.

Rachel has an important critique, which I accept. She points out that agile boot-camps and other styles of training for organizations and groups and teams in Agile practices often cite Shu-Ha-Ri, and require that teams do "just the scrum by the book" (or whichever method is to be used), at least initially. Then, as they master these techniques as they are put forth, they can adapt. She, however, sees a disrespect for the unique circumstance of the student in this. In her words, "I'm uncomfortable with approaches that force students to follow agile practices without questioning." I agree. But this is not what Shu-Ha-Ri implies. Shu should always include questioning... but the student should test the technique as presented, and part of the discovery is finding out its limits - how it works, and how it does not. But it's an experiment. It requires a control... and the control is the basic form, attempted as instructed, to get a baseline. Teaching any technique, physical or mental has a similarity in that respect. Is the metaphor limited - yes, and I argue that it is the agile-boot-camp folks who often mis-apply the martial arts concept.

I understand the concern she raises, especially the respect for context and unique character of the teams and the flexible nature of knowledge work... but this betrays a misunderstanding, or mis-application of Shu-Ha-Ri. Shu doesn't imply that the students are fungible. Technique is still taught in the context of the student (team). Sutherland has it wrong when he days "only when you have mastered the basic practices are you allowed to improvise." In Aikido, the students who are in a "Shu" mode are not improvising with "different" techniques, but they are applying them in different situations, and seeing how they fit. One adapts HOW to do the technique for a tall person, for a short person, a heavy person, and advanced Aikidoka, an unranked novice, etc. Likewise with Scrum, you apply the technique, but the coach helps the team use the technique in context. That's Shu. Ha then is where a team combines the techniques in unique ways. They may remove a practice, or replace it and see how that fits. Ri (mastery), they are inventing new techniques, or altering the basic forms in different ways. This is all quite reasonable, even in an agile coaching context.

In Aikido, especially, students practice the techniques in multiple contexts, so they can get a sense of the suppleness. Students are asked not to innovate initially, nor combine techniques before they have at least mastered the basic technique itself - so they're not thinking through each step of a move while they're doing it - they "get" it. They they can move closer to innovation.

Rachel's post, while understandably compassionate, confuses two separate things... models of instruction, training, and practice with notions of respect, oppression, and dominance. Telling a student to try the basic move and get it better before expanding isn't disrespectful, it's understanding the learning models of the student. In practice, it is quite possible for a student to grok the technique more quickly, and if Sensei observes this, he will show the student something slightly more advanced, and have him practice this. Or, Sensei may see the student struggling to apply the move, and may change the context to let the student appreciate what's happening. The point of Shu-Ha-Ri is, as Rachel points out, to ensure that one doesn't miss the basics while playing with the innovative and the expansive. If it's being used to hold a student back, or somehow contains a disrespect for the student - that's a failing of the teacher (coach). Student-sensei relationships are adaptive to the needs of the student, else they become a mere posture of dominance and submission, without the deeper communication that's supposed to occur within a relationship of trust.

I feel Rachel is (unintentionally and understandably) mischaracterizing Aikido and Shu-Ha-Ri based on mis-perceptions prevalent in the community on how its concepts can be applied. Aikido is taught in many forms, but Agile is not looser, nor is Aikido tighter a discipline. Aikido is, in any good Dojo, taught with great sensitivity to the needs, the capacities and readiness of the student. It is taught to groups, individually - that is to say, it is demonstrated to the group, then practiced in pairs, with Sensei observing, correcting... coaching. If anything, rolling Agile out in a large organization, and taking large groups through paces in boot-camps which do one-size-fits-all is UNLIKE Aikido training.

Bootcamps fail to do Shu-Ha-Ri if they insist that all steps of Shu are learned at equal pace by all learners. This is not Shu-Ha-Ri being harmful - this is Shu-Ha-Ri being ignored.

Labels: , , , , , , , ,

Wednesday, August 26, 2009

I don't believe in "Business Value" (sort of)

Ok, let's define terms. I'm not against value-oriented thinking. in fact, I'm all about the customer. I think whatever can add value to the customer or the output the customer wants (presumably software, in my case), the better. But this term "Business Value" has crept in to the Agile community, as a bastardization of the Lean concept of "Customer Value-Added activities" and it worries me. I'm concerned with terms here, and the term is used two ways. I'm increasingly concerned about it's sloppy use and the impacts of such.

In Lean, specifically in Value Stream Mapping there are two kinds of activities - those which add value to the customer, and those which do not. You analyze a process (say, software development from feature request to delivery into production) and you identify the time each activity takes, and whether that activity added or failed to add value to the customer. Writing code, writing tests, designing, working out clarifications of the requirements - these were customer-value-added activities. Waiting for a server to be deployed, waiting for three VPs to sign off on a design, etc. these were non-value-added activities. I like this kind of cycle-time measurement, because it forces the observer to be in the mind of the customer. If there's a business process which slows down the process, or, in lean terms, lengthens the cycle-time, it's an inefficiency and you try to find a way to remove it, circumvent it, etc.

At a fairly large financial service client, I saw a rather crazy thing occur. The coaches and suits were doing a value stream mapping. There were business activities whose performance served a business need, but not the needs of the customer. This might be a strategic steering committee meeting which a Team Lead might be required to attend, or a process step that existed to satisfy a regulatory requirement imposed on the business. These weren't really "customer value added" activities, but the coaches and business were unwilling to see them as non-value-added activities, or "waste". This led to an interesting compromise - the concept of "business value added" activities.

Now, I don't mind if there's a piece of waste in your cycle, and you acknowledge it, but then decide, for strategic trade-off reasons, to keep it. That's a sane business decision, even if it rankles my personal aesthetic. It is the prerogative of a company's decision makers to judge the interests of their customer against the interests of the organization and of the shareholders and come to whatever balance they feel best serves the principles and values and mission of their company. But by calling it "business value added" what they did, in effect, was move it out of non-customer-value-added column, which effectively stopped people from considering whether or not to reduce or eliminate it. It became synonymous with "customer value".

On the other hand, often when people use the term "business value" in the Agile community, they mean it as the customer value of a product feature, and are using it to help focus people's prioritization of work/features on the backlog (worklist for the uninitiated). And I get it. In that context, you need to have product managers prioritizing based on value, not cost, for reasons that Arlo Belshee can better explain. But the term business value gets mixed up in these various contexts, and I have heard consultants who, two years earlier would have called that committee review of design "waste" simply brush it off as "business value added" activity, and I think it's the sloppy language around BV and CV and NVA that is at the root of this phenomenon. In other words, hard nosed management consultants have stopped calling out the Emperor's nakedness. Since these external entities are the few empowered to call bullshit on a client, this means that less such is being called, to the detriment of our industry.

So yes, this is an argument about definitions and semantics. Yadda yadda yadda. But for my money, I want crisp meaning - especially if it allows the "cloak of truthiness" to descend and sow confusion about what's important. And if delighting your customers is critical to your long-term business survival, then anything not in that "value conversation" is waste (sorry, but it is - deal with it). You might have to live with some waste you can't get rid of yet, but you should never stop calling it like it is. Otherwise you begin to sink back into the very state from which you used value stream mapping to escape. And as far as I'm concerned, serving, nay, delighting the customer is the only way to be in business for the long haul.

Labels: , , , ,

Saturday, June 20, 2009

Make Fake/Stub objects simpler with proxies.

I recently wrote a convenience component. It's a Java dynamic proxy creator with a default handler that simply delegates any methods called on it to a provided delegate, throwing an UnsupportedOperationException if such a method isn't implemented. Why? Because I was sick of writing huge Fake implementations of medium-to-huge interfaces when I needed a fake, not a mock, and I was frustrated with using EasyMock as a convenience to create Fakes and Stubs. This delegator allowed me to implement only the parts of the API provided by the Interface, and just not bother with the rest. This was useful for such things as factories or other "lookup" components which I was going to feed (in my tests) from some static hash-map.

This gets back to my new motto: I hate boilerplate. If I have to implement every interface method in a fake object, then I'm going to go crazy implementing public String getFoo() { return null; } all over the place. Especially since that method will never get called in my test, and in fact if it does, I want it to fail noisily. So I could write public String getFoo() { throw UnsupportedOperationException("Test should not call this method"; }. That's great, but if I have to do that for a component that's got thirty methods, my test classes are going to be littered with this. Instead, I can do:

public class MyTestClass {

  private final hMap<Long> BAR_CACHE;

  @BeforeClass
  public void classInit() {
    BAR_CACHE = new HashMap<Long>();
    BAR_CACHE.put(1, new Bar(1, true));
    BAR_CACHE.put(2, new Bar(2, false));
    BAR_CACHE.put(3, new Bar(3, true));
    BAR_CACHE.put(4, new Bar(4, true));
  }

  @Test
  public void testFooValidityCount() { 
    BarFactory fooDep = Delegator
        .createProxy(BarFactory.class,new FooDelegate());
    Foo foo = new Foo(fooDep);
    foo.processBars();
    assertEquals(3, foo.getCountOfValidBars());
  }

  ... more test methods that need to look stuff up...

  public static class FooDelegate {
    public void init() { /* ignore */ }
    public Bar getBarWithId(int id) { BAR_CACHE.get(id); }
  }
}

If we assume that FooFactory is an insanely long interface, then this allows me to do a very clean Fake implementation with only a few bits of implementation. Otherwise, a FakeFooFactory could be longer than all the test code in this testing class. The other thing I like about this, is that - especially if you're testing code that uses dependencies you're not intimately familiar with, nor have access to the source, you can quickly implement an empty delegate and let the exceptions in the test guide you towards the leanest implementation. You'll get an exception any time your delegate is missing a method used by your System Under Test on your dependency. Handy.

Essentially, the Delegator is simply a pre-fab proxy InvocationHandler, with some reflection jazz in the invoke() method to lookup methods on the delegate, or throw... but it saves a bunch of extra boilerplate. I've had fake classes so large THEY needed tests. This is a much nicer approach for non-conversational dependencies (where mocks would be better). I tried to do this with EasyMock, but the implementations were really awkward due (ironically) to the fluent interface style. This is just a rare case where it's cleaner to just have a minimal implementation.

Unfortunately, the code I wrote is closed-source for my employer, but it's simple enough that everything I've done above should let you implement your own.

... and remember, kids... Mocks, Stubs, and Fakes are all Test Doubles, but they're all not the same thing.

Labels: , , , , , , , , , , ,

Sunday, April 12, 2009

Re-thinking Object-Relational Mapping in a Distributed Key-Store world

JDO, JPA, Hibernate, Toplink, Cayenne, Enterprise Object Framework, etc., are workable object to datastore mapping mechanisms. However, you still have to optimize your approach to the underlying data system, which in most cases historically have been RDBMS. Hence these systems, their habits and idioms are all quite strongly tied to Object-Relational Mapping. Since Google's App-Engine released a Java app hosting service, with a datastore wrapped in JDO or JPA via DataNucleus this week, people have been playing, and the difficulties of these very habits have become clearer. It's easy, when using JDO or JPA with BigTable, to design as if we're mapping to an RDBMS, which a distributed column-oriented DBMS like BigTable is not. There are some good articles on the net about how to think about this sort of data store:

http://www.mibgames.co.uk/2008/04/15/google-appengine-bigtable-and-why-rdbms-mentality-is-harmful/

http://torrez.us/archives/2005/10/24/407/

http://highscalability.com/how-i-learned-stop-worrying-and-love-using-lot-disk-space-scale

I'm struggling with this myself, being a long-time (longer than most) user of Object-Relational mapping frameworks. For example, one cannot do things with BigTable like join across relationships to pre-fetch child object data - a common optimization. Keystores are bad at joins, because they're sparse and inconsistently shaped. The contents of each "row" may not have the same "columns" as each other, so building indexes to join against is difficult. We actually need to re-think normalization, because it's not the same kind of store at all.

Interestingly, OO and RDBMS actually DIDN'T have an impedance mis-match in one key area in that Entity-Relationship models bore a striking structural resemblance to Object-Composition or Class diagrams, and the two could be mapped. Both RDBMS schemata and Object Models were supposed to be somewhat normalized, except where they were explicitly de-normalized for performance reasons. With distributed key-stores, we're potentially better off storing duplicate objects, or possibly building massive numbers of crazy indices. At this point, I don't know what the right answer is, habit wise, and I work for the darned company that made this thing. It's a different paradigm, and optimizing for performance on a sparce key-store is a different skill and knack. And since we're mapping objects to it, we'll have to get that knack then work out what mapping habits might be appropriate to that different store.

Eventually we will work out the paradigms, with BigTable and its open-source cousins (HBase and HyperTable) creating more parallelism and horizontal scaling on a "cloud-y web" as I've seen it called. The forums - especially the google-appserver-java and other forums where objects and keystores have to meet - they will grow the skills and habits within the community. But we have to re-think it, lest we just turn around and say "that's crap" and throw the baby (performance/scaling) out with the bathwater (no joins/set-theory).   

Labels: , , , ,