Every time I watch a presentation by Rich Hickey I want to write about it. (Last one was “Are We There Yet?”)
Let me start by saying I don’t really buy into functional programming. I think it’s very clearly harder for beginners to learn, and harder for experts to use efficiently. It’s no joke that a throwaway script in Python is a PhD dissertation in Haskell.
That said, it’s important to keep state and mutability under control. A little bit of functional style goes a long way. Once it starts hurting, stop pushing it. (Same for static typing.)
Anyway, despite all that… Rich Hickey is absolutely right when he talks about mutability on a larger scale.
The rules he defines for changing APIs are the rules for CRDTs. They are collections that you can add to, but never remove from. As long as you follow that simple rule, loose consistency (such as between different open source projects) is no problem. Once you try to add deletions or collision checking, you’re going to have a bad time.
I also strongly agree with him about the horribleness of Semantic Versioning (SemVer). I think it rotted people’s brains by convincing them, “when you want to bump the major version number, make sure you also break backward compatibility.” Which has never been how versioning has worked for successful software projects. Java is coming up on version 9 or 10 and can still run Java 1 code. Windows 10 still runs ancient Windows programs. Linux “doesn’t break user-space.”
His comment about Maven “only growing” was pretty insightful because it explained the meltdown that occurred when “leftpad” got removed from NPM. Indeed, as long as it only grows, you don’t care what version it is.
He mentions the problem of growing what you provide, versus growing what you accept, but he doesn’t use the terms covariance or contravariance.
He mentions identifying functions by hash, which I’ve seen discussed before. However I felt like failed to get to the root concept of identity (despite him talking at length about identity in other talks). Let me give an example.
I’ve been working on libkvstore, which intends to provide a single, rock-stable API that can be used and fulfilled by various other software. There are two ways (so far) that it needs to support arbitrary expansion: first, the selection of a specific back-end, and second, the setting of various configuration options. The set of back-ends will change (and hopefully grow) over time, and each back-end can support common or unique configuration options.
One can imagine identifying back-ends in several different ways: an enum, with sequential numeric values; UUIDs; hashes of the back-end code or binary; or, of course, names.
Sequential IDs are obviously flawed because they would require a single, strongly-consistent view to add new back-ends. UUIDs are alright but they can get out of sync, leading to unneeded incompatibilities. Hashes are overly brittle (as long as we expect the API itself to be correctly implemented, the specific version or compilation details don’t mater). Finally there are names, which ultimately are unique thanks to trademark law. It should be clear that for this particular use case, names are the clear winner.
Similarly, we have configuration options. The API includes an
ioctl-style interface so that each back-end can support any number of options with any parameters. The problem then is that applications, which need to configure these options, shouldn’t be too tightly coupled to the particular back-end in use. So we want every back-end to support whatever options it needs, but for common options to beneficially collide, and without necessarily any global coordination.
Again, the same basic options (sequential IDs, random IDs, hashes and names) are available, except this time instead of brand names, we have descriptive names for each option. So if several options support a filename option, it is identified by the string
filename, which hopefully reasonable people would choose even independently. (They might choose
path instead, but there’s only so much you can do.)
Anyway, what is the best way to identify a function in a program? By hash? Well, no, probably not. A meaningful name is probably better.
That said, his proposal to speed up unit tests by hashing the code they test and only rerunning them when when the hashes change sounded pretty sweet. Hashes make sense in that case because tests by definition can’t trust that functions do what they claim to do.
If names are so great and hashes suck, does that mean hash links like in StrongLink are useless? No… Well, maybe. When you edit a blog post or news article, there are some senses in which the identity stays the same, and some in which it changes. Really, it’s best to have both. (Especially if you don’t trust the source?)
In code these days, we sort of have hash addresses provided by Git. It could be better though.
One more note about CRDTs… People talk about “append-only,” but that’s wrong. It’s really “unordered-add-only.” If you rely on appending and try to preserve chronological order, you can get conflicts.