Sunday, 12 April 2009

Re-thinking Object-Relational Mapping in a Distributed Key-Store world

JDO, JPA, Hibernate, Toplink, Cayenne, Enterprise Object Framework, etc., are workable object to datastore mapping mechanisms. However, you still have to optimize your approach to the underlying data system, which in most cases historically have been RDBMS. Hence these systems, their habits and idioms are all quite strongly tied to Object-Relational Mapping. Since Google's App-Engine released a Java app hosting service, with a datastore wrapped in JDO or JPA via DataNucleus this week, people have been playing, and the difficulties of these very habits have become clearer. It's easy, when using JDO or JPA with BigTable, to design as if we're mapping to an RDBMS, which a distributed column-oriented DBMS like BigTable is not. There are some good articles on the net about how to think about this sort of data store:

I'm struggling with this myself, being a long-time (longer than most) user of Object-Relational mapping frameworks. For example, one cannot do things with BigTable like join across relationships to pre-fetch child object data - a common optimization. Keystores are bad at joins, because they're sparse and inconsistently shaped. The contents of each "row" may not have the same "columns" as each other, so building indexes to join against is difficult. We actually need to re-think normalization, because it's not the same kind of store at all.

Interestingly, OO and RDBMS actually DIDN'T have an impedance mis-match in one key area in that Entity-Relationship models bore a striking structural resemblance to Object-Composition or Class diagrams, and the two could be mapped. Both RDBMS schemata and Object Models were supposed to be somewhat normalized, except where they were explicitly de-normalized for performance reasons. With distributed key-stores, we're potentially better off storing duplicate objects, or possibly building massive numbers of crazy indices. At this point, I don't know what the right answer is, habit wise, and I work for the darned company that made this thing. It's a different paradigm, and optimizing for performance on a sparce key-store is a different skill and knack. And since we're mapping objects to it, we'll have to get that knack then work out what mapping habits might be appropriate to that different store.

Eventually we will work out the paradigms, with BigTable and its open-source cousins (HBase and HyperTable) creating more parallelism and horizontal scaling on a "cloud-y web" as I've seen it called. The forums - especially the google-appserver-java and other forums where objects and keystores have to meet - they will grow the skills and habits within the community. But we have to re-think it, lest we just turn around and say "that's crap" and throw the baby (performance/scaling) out with the bathwater (no joins/set-theory).   


Matt Farnell said...

mmm, interesting, it was hard enough twisting my mind around when I moved from procedural programming to Object orientated programming, now another mind twist? I'm getting to old for this :)

thanks for the 'jolt'

Matt (from the google app engine forums)

Christian Gruber said...

Yeah, I had the fortune to use NeXTs while in school, so back in the early 90's I had already absorbed O-O. But I have almost (almost) got used to one mind-twist in the tech world about every half-decade or so, so this isn't a surprise to me. But it will take a bit to adjust.

Kimota94 aka Matt aka AgileMan said...

Hey, wait a second... wasn't the hiatus from posting supposed to be over now? ;-)

We're not getting nearly enough geek-in-a-suit stuff, dude! Find time for your loyal readers, won't you please?!

Christian Gruber said...

Yeah yeah. I'm sorry krazy komik guy. I'm out of my mind busy at the moment. I do have about five unfinished drafts, with two more outlines... I should really actually finish one. ;)

Eduardo Andrés Ostertag said...