It all came down to the content addressable cache.
So... what the heck am I talking about? Let me set the stage.
Misleading cache hits
I dug around and finally found out that (a) I had put in the wrong sha256 - the one from the default 1.2.70 version, and (b) it was satisfying my request, not from the network, but from a machine-wide "content-addressable" cache. On my colleague's machine, it had never downloaded any files before for bazel, so it tried to satisfy it from the network and choked when it fingerprinted the file. And (c) after digging I realized that it was supplying 1.2.70 when I asked for 1.2.71, because the content-addressable cache indexes by the hash, and the URL has no part in the cacheing. It literally only cared about the name when it wrote the file contents from the cache into my build working directory.
Wait what? The URL played no part in the cache index?
Must be a bug
So, bazel was putting this in the CA cache because it was literally saying - the content is they key. Whatever the file name is, if the content hashes to <some number>, then any request for something with the same number must want the same content.
I, on the other hand, was thinking in URL-centric terms. I wanted the file found at that location (I thought), and I supplied the hash to verify. These aren't incompatible world-views for the most part. Usually I actually do want the content, I just assumed it's going to be downloaded from that location. It's only around this error-handling question that they diverge, and when I consider Bazel's perspective on wanting to create fast, hermetic builds.
Bazel simply assumes that you mean it when you put in the expected sha256. Bazel assumes you're not just naively cutting and pasting. It'll check the first time you go to download it, but in this case, I used a perfectly valid sha256 hash for which it had a valid file. And it served it. From the cache that is... addressed/keyed by the content of the file (or at least its hash).
Is bazel right here?
To what benefit? Well, assuming I don't make an error on my side, there are quite a few. For one, download poisoning is harder in a variety of ways out of scope here. Additionally, anything addressable by sha256 can now be downloaded once, and never downloaded again from the wire, even on a clean build (since it isn't "dirtied" because the hashes are the same). This can lead to a lot of benefits in continuous-integration machines where many projects downloading the same files over and over can avoid them. It also provides some solid ground for building distributed caches.
I went to file the bug, but have decided that this, surprisingly, is a feature, not a bug.
Thinking more bazely
I realize that, in retrospect, this might seem obvious. It seems that way to me, too, now. Thinking about the web, privileging addresses not content is pretty easy. It's hard to quantify or express in words, but a lot of subtle things that bugged me about Bazel's (and Starlark's) design and idiosyncrasies have smoothed out in my brain because of this simple perspective adjustment. Hopefully it helps others reason more effectively about such things as well.
It still isn't a good user experience.
The expectation of a (U)niversal (R)esource (L)ocation is, well, that. It's a pretty common prejudice stems from.
If I understand the case properly, the url was changed in the depot rule -- and the build ignored the change because it could already get hash it was looking for. I would expect a change in the url to invalidate cache. It's an optimization that it doesn't, and that optimization wasted a bit of time.
Of course, the only way to truly validate if it is a bug, is to see if it's documented. Is it?
For additional discussion or context, see the bug tracker: https://github.com/bazelbuild/bazel/issues/5144
(it's not guaranteed that this behavior will stay)
Post a Comment