Thursday, 21 February 2019

Bazely thinking and the tale of the content-addressable cache...

Bazel is awesome.  It does a lot of things to ensure hermetic builds, etc.  But part of thinking in these terms means that the bazel engineers think about certain problems from a different point of view.  I discovered this in trying to debug a weird problem where my machine was working and a colleague's was failing both on a clean checkout.

It all came down to the content addressable cache.

So... what the heck am I talking about? Let me set the stage.


Misleading cache hits


I was working on a bazel conversion experiment (we're looking at migrating to bazel, but needed to try it out in a limited scope).  It uses kotlin and the kotlin rules require downloading the kotlin compiler, located on github.   The default in the bazel kotlin rules is 1.2.70 (at time of this writing), but I wanted to pull in a different version.  All well and good.  You set the version, you put in the sha256 of the file, and ... go.  On my machine it worked flawlessly.  On my colleague's machine, it dutifully downloaded the binary, and then threw a fit over a bad checksum.   I repeat... we had exactly the same checkout. Same environment... we thought.

I dug around and finally found out that (a) I had put in the wrong sha256 - the one from the default 1.2.70 version, and (b) it was satisfying my request, not from the network, but from a machine-wide "content-addressable" cache.  On my colleague's machine, it had never downloaded any files before for bazel, so it tried to satisfy it from the network and choked when it fingerprinted the file.  And (c) after digging I realized that it was supplying 1.2.70 when I asked for 1.2.71, because the content-addressable cache indexes by the hash, and the URL has no part in the cacheing.  It literally only cared about the name when it wrote the file contents from the cache into my build working directory.

Wait what?  The URL played no part in the cache index?

Must be a bug


I originally went to work up a repro-case, thinking that this is a clear bug.  I asked for http://github.com/blah/blah/blah/blah-1.2.71.zip, and it gave me 1.2.70, because I had put the wrong sha256 hash in.  It should have caught my error!  Bad bazel.  How dare it assume I knew what I was doing.  That's not safe infrastructure.   And then it hit me.  Bazel and I were thinking of the world differently.  The key was in the name "content addressable" (which I will call CA from now on, sorry Californians and Canadians). 

So, bazel was putting this in the CA cache because it was literally saying - the content is they key.  Whatever the file name is, if the content hashes to <some number>, then any request for something with the same number must want the same content.

I, on the other hand, was thinking in URL-centric terms. I wanted the file found at that location (I thought), and I supplied the hash to verify.  These aren't incompatible world-views for the most part. Usually I actually do want the content, I just assumed it's going to be downloaded from that location.  It's only around this error-handling question that they diverge, and when I consider Bazel's perspective on wanting to create fast, hermetic builds.

Bazel simply assumes that you mean it when you put in the expected sha256.  Bazel assumes you're not just naively cutting and pasting.  It'll check the first time you go to download it, but in this case, I used a perfectly valid sha256 hash for which it had a valid file.  And it served it. From the cache that is... addressed/keyed by the content of the file (or at least its hash).


Is bazel right here?


Yes and no.  This is a tricky thing - Bazel is using sha256 hashes to make sure builds are repeatable and immutable (same inputs lead to the same outputs) and this is in service of both security and performance. Bazel thinks "same inputs, same output", and partly doesn't give a crap where the content came from.  For most downloading rules you end up being able to give it multiple URLs, and it'll take whatever one it can use, as long as the content's hash matches.  Even there, it's not seeing a canonical "location" as the key, but the content itself.  It's largely only my prejudices that led me to assume otherwise.

To what benefit?  Well, assuming I don't make an error on my side, there are quite a few.  For one, download poisoning is harder in a variety of ways out of scope here. Additionally, anything addressable by sha256 can now be downloaded once, and never downloaded again from the wire, even on a clean build (since it isn't "dirtied" because the hashes are the same).  This can lead to a lot of benefits in continuous-integration machines where many projects downloading the same files over and over can avoid them. It also provides some solid ground for building distributed caches.

I went to file the bug, but have decided that this, surprisingly, is a feature, not a bug.


Thinking more bazely


Now that I have adjusted my expectations, I can think about the content (or the hash) as the unit of account, and things like URLs as the fetch mechanism for satisfying the content if it isn't already supplied.  This both should help me not make this particular mistake again, but also helps me understand a lot about the design decisions of the tooling. 

I realize that, in retrospect, this might seem obvious.  It seems that way to me, too, now.  Thinking about the web, privileging addresses not content is pretty easy.  It's hard to quantify or express in words, but a lot of subtle things that bugged me about Bazel's (and Starlark's) design and idiosyncrasies have smoothed out in my brain because of this simple perspective adjustment.  Hopefully it helps others reason more effectively about such things as well.

Wednesday, 10 October 2018

Suiting up

I named this blog (and my tech-focused twitter account) GeekInASuit because I had spent a good chunk of my career in the financial industry doing technical architecture, and other technical consulting, and was the person who could talk to the nerds and the suits.  I wore a suit, and so signaled in a way that finance folks would talk to me, but also had earrings, long-hair, and otherwise "signaled" that I wasn't just a suit.   It was a great ride, and it was a lovely role, being the cultural translator, often gleaning important insights that helped clarify project details that could have been miscommunicated.  I relished that part of my career.

Then I took a job at Google, and everything changed. My role was decidedly technical, but all my customers were also technical.  I lost the primary purpose of geekinasuit, and even more, Googler culture really venerates the geek, not so much the suit.  To be honest, I kind of got shamed out of the suit. Lots of social pressure was applied, as well as a nearly endless supply of t-shirts - I swear, 20% of my compensation, by weight, was t-shirt.  So I relented, and spent most of a decade wearing jeans and t-shirts, usually with nerdy slogans or fan-service.  While in one sense, it didn't matter, I had come to like dressing up a bit.  I enjoyed taking a bit of time for self-care and grooming beyond simple standard hygiene.  So I was sadder than I realized, when I finally accepted it, and stopped upgrading my wardrobe as jackets and shoes and pants succumbed to wear and tear (and got a little tight, I admit).  It was with sadness a couple of months ago that I realized that my last suit was actually no longer going to fit me. The lining was worn out, and the wedding I was preparing to fly to would require actually buying a new suit.  I had let things go that far.

That wedding coincided with my departure from Google/YouTube. I left for a variety of reasons, moral, financial, emotional - I left with some sadness and wistfulness, but also with a sense of maybe getting back to myself.  I had gone down a deep hole in Google, lost a lot of professional contacts, reduced to only Google's technology stack. I mitigated it with doing a lot of my work in open-source, but it certainly wasn't a life I had led before, connecting with colleagues at conferences, serving customers more directly, and working with the technologies most of the industry uses.  But also... I stopped appreciating having my identity swallowed by the behemoth that Google had become.  Don't get me wrong - there are lots of good people and ideas and challenges at Google. I did work there that I'm very proud of (the Dagger and Truth frameworks, for example) but it also took me over, in many ways.  So I left, to join Square, and help in their mission of economic empowerment (by helping them scale up their development).

And I suited up.  I decided to restore at least that, even if just as a symbol to myself, to be better, to push myself, to care for myself, as superficial as clothing and appearance are.  So far, I've been suited-up 95% of the days I've been in the office, and it feels really good.  It's a state-change in my brain, segmenting a work mode from other modes, and it oddly helps me stay focused (pretty necessary in the open-office wasteland that characterizes basically every tech company for some reason).

I am, once again, a geek in a suit.  And I love it.


Thursday, 17 November 2016

WTF does "vend" mean? A terminological journey with no clear ending.


So there I was, in the middle of a code review, adding a method which (in the language of dependency-injection) "provided" a value into a managed object graph with a different key, in advance of a migration.  Doesn't matter why.  But the word "provided" is so frequent that I went with a different word that (in my brain) seemed to mean the same thing: vend.

Now, the root of this little language odyssey is simply that I hate repeating words unnecessarily.  If you use dependency-injection, the word Provider is vastly overused thanks to Guice's Provider<T> interface, the JSR-330 which standardized it, and it's baby brother Dagger and other frameworks which adopted the standard terminology (Spring, J2EE, Tapestry, etc.)

Since the API involved was a method annotated with @Provides and the method was called provideBlah() (real method name changed to protect the innocent), and I just wanted some variety in my life. So I described the change this way:
Vends a [redacted] into the dagger graph, in advance of an API change where [redacted] will consume that in place of [redacted].  Part of a migration to make [redacted] require fewer assumptions (and fewer build deps) of its consumers.
Could have been "supplies" but I didn't want to imply the Supplier<T> interface, which is a thing.  I went with "to vend".

In that context, I got a drive-by comment.

▾ 
someuser
3:24 PM, Nov 16
What does "Vend" mean in the context of this change?

I was doing a cleanup using some of our awesome google-made bulk refactoring tools (notably Rosie), so this was one of those "dammit, why can't you just approve my change and let me get on with my life moments."

At first, I just went ahead and answered:

▾ 
cgruber
3:31 PM, Nov 16
> What does "Vend" mean in the context of this change?
Provide into the graph.

Not to be disuaded, "someuser" pressed on:

▾ 
someuser
3:56 PM, Nov 16
Normally, "vend" means to sell ...

Ok... gonna try again to avoid the digression and nerd-sniping...

▾ 
cgruber
3:59 PM, Nov 16
Vend also implies supplying, and I was trying not to overload the term "provide" because in this context, "provide and bind" are both apt terms. Regardless, I've updated the description.
And vend only means sell in a societal context of capitalist voluntary exchange. I can't imagine it would mean sell in the context of an API. :)

This last paragraph was a total indulgence on my part, and the result of my having mainlined (liked, literally injected into my arm) econ textbooks and treatises for the last few years.  And obviously my big mistake in the effort to avoid being nerd-sniped.

Not to be so easily dismissed... "someuser" decided to call me out.

▾ 
someuser
6:17 PM, Nov 16
<nit-picking-mode>
I agree that "vend" implies supplying something, but I have only seen it in the context of a sale. With all due respect, can you point to a definition of vend that means to supply or provide, that is not in a "societal context of capitalist voluntary exchange"? (I'm actually really curious, as in, I quite often read etymologies of words. :-) )
> I can't imagine it would mean sell in the context of an API.
That's why I was confused :-)
</nit-picking-mode>
Thanks for changing the description though :-)

Oh, it's on like the break of dawn, now.

I started looking. I couldn't find definitional resources (but I knew this was a particular inflection of use in the context of computer software and API design.

I started with a web-search of the terms: "api which vends a type" just for starters. I was not disappointed. Nine relevant examples in the first three pages.

Then I got philosophical, noticing that the code examples in which APIs which were described to "vend" things seemed to always be in Objective-C or Java sources. I started to think back, way into the early days of my career, steeped as they were in NeXTSTEP, and wondered whether there was a connection.

Here is what I replied.


▾ 

cgruber
9:29 AM
No, but I can find examples of its usage in tech, from which I apparently have picked it up over a couple of decades:
Some of these you have to ctrl-f/cmd-f and search for "vend" as they're not in the description but in comments. Also, in some cases it is synonymous with "supplies" (as in via the return type) and in other cases with "offers" in the sense of exposing an API):
https://github.com/attic-labs/noms/issues/2589
https://github.com/realm/realm-cocoa/issues/3981
http://stackoverflow.com/questions/37128296/rest-api-oauth2-type-authentication-using-aws-cognito/37141020
http://nlp.stanford.edu/nlp/javadoc/javanlp-3.5.0/edu/stanford/nlp/objectbank/DelimitRegExIterator.html
https://framework.zend.com/apidoc/1.12/packages/Zend_Pdf.Fonts.html
https://docs.oracle.com/javaee/7/api/javax/faces/render/package-summary.html
https://vaadin.com/api/7.5.7/com/google/web/bindery/requestfactory/shared/InstanceRequest.html
https://jeremywsherman.com/blog/2016/07/23/why-im-meh-about-json-api/ http://liftweb.net/api/25/api/net/liftweb/http/LiftRules.html
(sourced from the first few pages of the google query "api which vends a type")
I have a hypothesis: This language originated in the NeXTSTEP community (of which I was a part), and entered into the MacOS/iOS community lexicon from that source, and also into the Java community by way of a lot of NeXTSTEP folks joining Sun and related Java-oriented enterprises (at one point Javasoft was 25% populated by former Lighthouse Design people, of which I was one). So I suspect I picked it up early, but it is a very uncommon (as I find out in researching it) usage... but not purely in my head. :)
Small addendum... a straw poll of my team which includes iOS developers as well as android developers suggests that it is vastly less common than I would have imagined from my own biases. That doesn't discount the above links but provides a bit of a ratio, a denominator for the numerator of anecdotes I cited above. Seems like a fringe usage, and sadly, provides no insight into from whence this minority usage actually derives.

How common is this in use behind the walls of corporate secrecy? Doing an internal code-search I see a handful of examples with a cursory scan of initial results - all  in API docs with this usage of "supply".  It at least seems that I'm not entirely out of my mind, or at least others share my heterodox usage.

So now I'm damned curious. How did I pick this up?  I see these examples of the usage - is there a common source?  Did we all pick it up from one place, or did we independently start using it the same way?

If any of the three people left following this blog have a clue about this, I'd love to hear more insights.

Wednesday, 9 March 2016

Keep maven builds safe from "M.A.D. Gadget" vulnerability

Coming out of blogging retirement to point at a rather big issue, and to contribute to solving it a bit.

Per this blog article from Nov 2015 there is a rather large security vulnerability observed within the apache commons-collections library versions 3.0, 3.1, 3.2, 3.2.1, and 4.0. In the spirit of the fact that vulnerable classes are called "gadgets", a colleague of mine referred to this as the M.A.D. Gadgets bug. In essence, classes which reference Apache commons' vulnerable versions and perform serialization can effectively turn the entire JVM into a remotely exploitable exec() function (metaphorically speaking).

While people are busy swapping out vulnerable versions for newer ones, the way dependency graphs work in automated dependency management systems like maven, ivy, gradle, etc, is that a project might be obtaining vulnerable versions from a transitive dependency. That's bad bad bad. So, apart from updating deps, it's important to guard against a recurrence, and you can do that, at least in maven, via the maven-enforcer-plugin.

I threw together this github gist with an example configuration that can ban this dep from your deps graph, including (most importantly) inclusions via transitive dependencies that you didn't even know you had. Here is the gist's content (I'll try to keep both updated if I change them).



<project>
  <!-- ... -->  
  <build>
    <plugins>
      <plugin>
        <artifactId>maven-enforcer-plugin</artifactId>
        <executions>
          <execution>
            <goals><goal>enforce</goal></goals>
            <configuration>
              <rules>
                <bannedDependencies>
                  <excludes>
                    <exclude>commons-collections:commons-collections:[3.0,3.2.1]</exclude>
                    <exclude>commons-collections:commons-collections:4.0</exclude>
                  </excludes>
                </bannedDependencies>
              </rules>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
</project>

So... fix your projects... but also make them better. Throw this into your project's parent pom, and then it will give you a build-breaking knowledge of whether you're vulnerable, and you can update your deps or use dependency exclusions to prune out any found occurrences you get from your upstream.

Edited to include all affected versions, and re-published from my original article in March, because blogger.com's url management is frustrating.

Tuesday, 20 August 2013

Wow, I suck at blogging.

I just discovered, because of a bug filed on the Guice project, that my blog DNS settings were pointing at a domain parking page. Whoops! When I made the transition from Canadia to Amurika this year I totally neglected to fix up some domain settings. My bad.

But it sort of highlights that I don't have a good discipline around blogging. Examining this, I see about ten draft posts in my blogger account, that I never got around to posting, and which are now sort of obsolete and irrelevant. I'm not so narcissistic to think that everyone cares what I have to say but I provide zero value by never writing at all. :( So, sorry for that. I look with irony at my post of a few years ago stating that "I'm totally gonna blog again, now, promise!" Apparently not.

Life has been crazy - being at Google is a whirlwind. It's exciting, stressful, but also charming. You get very much "dug in" in certain ways, but most of those ways aren't awful - they just occupy your attention.

What IS wonderful is that I've been able to work on primarily open-sourced projects, being a part of the the java core-libraries team. This has meant working on the google core libraries offering Guava, dependency-injection frameworks such as Guice, and Dagger, as well as a host of smaller projects such as Auto (let robots write it!) and contributing to my little testing/proposition framework, Truth. Seeing these things evolve, sometimes in response to each other, has been wonderful. And I get paid to do it! Why? Not because Google is purely altruistic, though Googlers seem to have a really strong bent towards contributing back. But these things really help Google develop world-class software at-scale.

I was in a little internal un-conference of core-librarians of the various supported languages, and my boss pointed out that the fact that we HAVE core libraries and tooling efforts like this is a major contributor to Google's competitiveness and capability. We fund people to figure out what patterns people use and what code developers re-write on every project and creating hardened, efficient, common implementations/APIs/frameworks out of them, where appropriate. We don't try to re-use everything, but we dig and see where we can "maximize the productivity of labour" (to borrow from the economists) of our colleagues by reducing their coding and testing burden to focus on the things that make their application unique from others. In short, we invest in future production, in future capacity for our developers, both in quality and velocity.

Often, we aren't writing tons of code to do it, but rather examining patterns and tweaking, deprecating certain approaches in favor of others, and using the rather stunning tooling we've evolved (blogged about elsewhere) and tools we've open-sourced, to migrate our entire code-base from deprecated forms to new ones. But we also consider new approaches, libraries, and frameworks, both developed internally and externally. It's actually remarkable (to me) that a company this big can change direction so quickly, and adapt to new realities so quickly. The joke among my team is that we're starting to be less a core-libraries team, and more of a javac-enhancement team, since we are also doing a lot of building in static analysis and checks (thanks error-prone folks) into our tooling to prevent error at compile time as we are building new frameworks and tools.

While we've had a few false starts here and there, we are increasingly engaging in joint projects and accepting contributions into the codebase from external parties who benefit from the open-source work as well, which is gratifying. Nothing quite so happy as win-win exchanges.

All told, it's been a couple of years of full engagement, and not a lot of time to do tech blogging. But I'll give it another go, even if it's just to do updates like this from time to time. It's the best job I've had to date, and I am thrilled to be in contact with such high-quality professionals as I am on a daily basis.

Thursday, 14 July 2011

Five years out, and still stable. (Or how I installed new git on old unix)

So, I have an OpenBSD 3.9 machine. It's circa 2006 since I last updated it. Lame, I know, but it's stable, doesn't really do much, has few problems and security holes, and runs nearly no services. But I store some files on it and have been using SVN to manage them. I wanted to start using Git, since I use it more consistently these days. I thought about Mercurial, but - meh - I've stopped fighting religious wars, even with myself.

Problem is - Git, even an old 1.4 version, doesn't exist in the ports tree of OpenBSD until 2007. So, what am I to do? I could update my copy of OpenBSD - an option. A good one that, in the long run, I really should do, even just for security fixes. I could bump the ports tree up a couple of versions until the git port exists, and then try to build it and hope the toolchain works. There are a lot of moving parts to the ports infrastructure, and they evolve between releases of OpenBSD. I played with that for a minute, and in the end, decided to go for broke, and do the naive thing that shouldn't work.

I un-tarred the latest Git 1.7 that's part of the latest OpenBSD release, into /usr/local, deleted the +DESC and +COMMENT files, and basically ran it to see what broke.

Well... it couldn't link to libc.so.58.0 - fair, since I only had libc.so.39.0. So I symbolically linked it. Did the same for libcrypto and libiconv. Seriously. Just symlinked them, hoping no symbols were used by git that had been added or changed in more recent versions.

Worked like a charm.

OpenBSD, and the various core libraries have been stable enough that half-again as many version bumps in libc haven't changed it enough that git needed it - likewise libcrypto. Kudos! I know I lucked out, but still - impressive.

Wednesday, 21 July 2010

Singleton and Request scope considered harmful? No! Scopelessness considered harmful.

I watched an e-mail thread discuss how Singleton and Request scope in Guice were harmful, because Singletons can only depend on Singletons (without magic) and that Request Scoped items can only depend on Request Scoped and Singletons... followed by the idea that because "we're stateless" with minimal global state, so we shouldn't need global things.  And because it's cheap to build things so we can build them all the time.

I feel that this perspective comes from a lot of fighting with scopes and scoping problems and probably is an honest attempt at resolving them - if you're stateless, then scope IS somewhat irrelevant... but it also turns out that if you're not stateless (by accident) and you go scope-less, then you have state accidentally scattered all around your object graph without realizing you've copied it.  Leave aside entirely that you're going to thrash garbage collection because your dependencies are stored in ivars which will go on the heap...

So what should happen here?

Other than Guice, all other dependency-injection containers of any substantial distribution use "Singleton" as the default scope, not "scopeless" (or what spring calls prototype scope).  This is because Singleton is actually cleaner and easier to understand.  There is one copy of this in the system... so it needs to either be stateless or carefully guard its state for multi-threaded access.  But also the lifecycle is clearer - app starts up, component starts up, component shuts down, app shuts down.  Scopeless (Guice "default" scope) actually have an arbitrary lifecycle based on who asks for them.  

If you have Foo -> Bar (where ->  means depends-on), and Foo is any real scope (singleton, session, request, per-thread, whatever), but Bar is scopeless (meaning a new one is created on every demand), then Bar's lifecycle is different if Foo depends on it than if Bash depends on it because it attaches to the lifecycle (lifetime... or scope, if you will <ahem>) of the dependent component.

This is freaky because it means it's sort of indeterminate until used (I call it the Heisenberg scope).  And each time it's used it could be different.

Again, if it's stateless, no problem... but if it's stateless, Singleton is no problem... and cleaner... because you'll uncover scoping problems more quickly with more restrictive scope policies.  But moving everything to no-scope means no clear lifecycle... or rather, whatever lifecycle and lifetime you happen to have in whatever calls it. 

I think people look at scope as "magical" - especially in the Guice community. I don't see this kind of thrash in Picocontainer, Tapestry-IOC, or Spring user communities.  And I think it's because "prototype" scope (scopeless) is seen as a quasi-factory behaviour, not a dependency/collaborator injection behaviour.  The two are subtly different, and I think the distinction is lost in the Guice user community.  I have ideas as to why this distinction arose between the Guice community's thinking and others', but I'll leave that for another post.  

The point is, Scope implies a validity within a lifetime, and if something is stateless, there's no reason it shouldn't be the one and only copy with a lifetime of the entire application's lifetime. I've long posited that "games with scopes" is a dangerous thing in dependency injection, but this is solving the problem by dropping a 16 ton weight on your own head.  It uses the most magical quasi-scope to create a fan of instances where a very small set of collaborators are required.  

I'm getting close to believing that Fowler, Martin, and others were wrong, and that Dependency Injection (heck, occasionally O-O) are just too dangerous. Seriously.  I can't imagine not using them, personally, but I find so many teams and projects where they just think about the problem so unhelpfully and then their clean-up efforts are worse than the mess they created. <sigh>