HomeLog InSource


[View] [Short] [Hash] [Raw]


Spec-ulation Keynote - Rich Hickey

Every time I watch a presentation by Rich Hickey I want to write about it. (Last one was “Are We There Yet?”)

Let me start by saying I don’t really buy into functional programming. I think it’s very clearly harder for beginners to learn, and harder for experts to use efficiently. It’s no joke that a throwaway script in Python is a PhD dissertation in Haskell.

I also think it’s less efficient. It should be clear by now that efficient optimizations align[#] with hardware. That means mutable state and registers rather than stacks (like Forth or the JVM).

That said, it’s important to keep state and mutability under control. A little bit of functional style goes a long way. Once it starts hurting, stop pushing it. (Same for static typing.)

Anyway, despite all that… Rich Hickey is absolutely right when he talks about mutability on a larger scale.

The rules he defines for changing APIs are the rules for CRDTs. They are collections that you can add to, but never remove from. As long as you follow that simple rule, loose consistency (such as between different open source projects) is no problem. Once you try to add deletions or collision checking, you’re going to have a bad time.

I also strongly agree with him about the horribleness of Semantic Versioning (SemVer). I think it rotted people’s brains by convincing them, “when you want to bump the major version number, make sure you also break backward compatibility.” Which has never been how versioning has worked for successful software projects. Java is coming up on version 9 or 10 and can still run Java 1 code. Windows 10 still runs ancient Windows programs. Linux “doesn’t break user-space.”

His comment about Maven “only growing” was pretty insightful because it explained the meltdown that occurred when “leftpad” got removed from NPM. Indeed, as long as it only grows, you don’t care what version it is.

He mentions the problem of growing what you provide, versus growing what you accept, but he doesn’t use the terms covariance or contravariance.

He mentions identifying functions by hash, which I’ve seen discussed before. However I felt like failed to get to the root concept of identity (despite him talking at length about identity in other talks). Let me give an example.

I’ve been working on libkvstore, which intends to provide a single, rock-stable API that can be used and fulfilled by various other software. There are two ways (so far) that it needs to support arbitrary expansion: first, the selection of a specific back-end, and second, the setting of various configuration options. The set of back-ends will change (and hopefully grow) over time, and each back-end can support common or unique configuration options.

One can imagine identifying back-ends in several different ways: an enum, with sequential numeric values; UUIDs; hashes of the back-end code or binary; or, of course, names.

Sequential IDs are obviously flawed because they would require a single, strongly-consistent view to add new back-ends. UUIDs are alright but they can get out of sync, leading to unneeded incompatibilities. Hashes are overly brittle (as long as we expect the API itself to be correctly implemented, the specific version or compilation details don’t mater). Finally there are names, which ultimately are unique thanks to trademark law. It should be clear that for this particular use case, names are the clear winner.

Similarly, we have configuration options. The API includes an ioctl-style interface so that each back-end can support any number of options with any parameters. The problem then is that applications, which need to configure these options, shouldn’t be too tightly coupled to the particular back-end in use. So we want every back-end to support whatever options it needs, but for common options to beneficially collide, and without necessarily any global coordination.

Again, the same basic options (sequential IDs, random IDs, hashes and names) are available, except this time instead of brand names, we have descriptive names for each option. So if several options support a filename option, it is identified by the string filename, which hopefully reasonable people would choose even independently. (They might choose filepath or path instead, but there’s only so much you can do.)

Anyway, what is the best way to identify a function in a program? By hash? Well, no, probably not. A meaningful name is probably better.

That said, his proposal to speed up unit tests by hashing the code they test and only rerunning them when when the hashes change sounded pretty sweet. Hashes make sense in that case because tests by definition can’t trust that functions do what they claim to do.

If names are so great and hashes suck, does that mean hash links like in StrongLink are useless? No… Well, maybe. When you edit a blog post or news article, there are some senses in which the identity stays the same, and some in which it changes. Really, it’s best to have both. (Especially if you don’t trust the source?)

In code these days, we sort of have hash addresses provided by Git. It could be better though.

One more note about CRDTs… People talk about “append-only,” but that’s wrong. It’s really “unordered-add-only.” If you rely on appending and try to preserve chronological order, you can get conflicts.

[View] [Short] [Hash] [Raw]


Like most healthy and well-adjusted people, I occasionally have days when I feel down about myself or life in general. It’s kind of like the stock market: things are generally stable or even slowly rising, and then every once in a while there’s a crash, followed by a period of readjustment.

I don’t think it’s a bad thing (in either case) as long as the crashes aren’t too severe. And it’s better to have them more frequently than to prop up a failing system, which only makes them worse.

It can take me a while to figure out the cause and put together some plans to address it. The temptation is always there to distract myself with relaxation or amusement until I forget about the problem, instead of digging in and determining what’s wrong.

Like with the stock market, my personal crashes often boil down to overconfidence, egotism, and putting too much faith in flawed financial instruments.

When I want to distract myself with shallow entertainment, sometimes I ask, “what is the problem I’m feeling bad about, and is ignoring it going to help or make it worse?” Often times, ignoring things is what got me into trouble in the first place.

Even though I put a lot of effort into trying to be a good person, I’m still not great at it (although my mom would disagree). Sometimes my problem is that I wasn’t thinking about things from someone else’s point of view, or I didn’t notice the difference between what I tried to say and the way it sounded, or various mundane mistakes like that.

Anyway, all of that is setup.

Today, when I was feeling low, I had a slightly different, more subversive thought: “Instead of trying to fix my failings, what if I just tried to compensate for them?”

In other words, “maybe he’s a retard, but he wrote an alright database.”

Keywords: humor

Is that sort of compensation allowed? I think it should be, at least up to a point.

That said, I’m still deciding how to proceed.

[View] [Short] [Hash] [Raw]


What is a sandbox?

A hypervisor is not a sandbox. A sandbox might run inside a hypervisor.

People talk about the “browser sandbox” or the “Lua sandbox” or the “Xen sandbox” but none of those things are sandboxes. A sandbox has no features and lets you do nothing you couldn’t do already.

A sandbox uses an API to implement that same API. Securely, of course.

The core of a sandbox is a specification of a safe subset of an API. It then either blocks unsafe calls.

Formal proof of the sandbox code and specification is only one part of its security. Because the underlying implementation is assumed to be buggy, a sandbox needs to be adaptable as bugs and workarounds are discovered.

A sandbox can mitigate this risk by being conservative in what APIs it relies on. The output of a MOV-only compiler is unlikely to be affected by any CPU bugs.

Once you have a sandbox providing a useful subset of an API, you might be able to emulate the full API using only that subset. That emulation isn’t the sandbox, or part of it, but they are often paired together.

Something that actually is a sandbox is a network firewall. Of course, port 80 is a hole you can drive a truck through. The emulator is the proxy you use to get useful work done despite everything being blocked.

Another sandbox is the NaCl verifier. It’s already been updated to remove CLFLUSH due to RowHammer. The emulator is the compiler they supply.

Now let’s look back at the browser and try to see what sandboxes it has. One immediate example is the same-origin policy. However SOP is slightly weird in that it doesn’t protect the user or the website in question; it’s intended to protect other websites. Perhaps those poor incentives are why it was never fixed when cross-site request forgery (CSRF) was discovered.

There’s some sort of policy against using the hard drive, but it’s ill-defined. You’re allowed to use it as an SQL database (WebSQL) or for tracking (supercookies), but there’s no definite limit on what directories you can write to or how much data you can store.

But my biggest complaint is that all of the APIs within the sandbox are different from those used to implement it. We seem to be unable to sandbox without making up new interfaces at the same time. We wouldn’t need WebSQL if we had a file-system sandbox we could run SQLite on top of.

I can think of a few reasons. First, it’s hard to (ahem) draw lines in the sand without specific use cases in mind. So we start by adding APIs for microphone and webcam, and then later decide that’s not good enough and add WebUSB.

Second, because most of the underlying APIs are grown rather than designed. They’re huge and complicated and every application uses a disjoint subset.

Relatedly, there are a dearth of vendor-neutral APIs to choose from. On the web, portability is a big concern, so the primary focus is on developing new APIs that are equally inconvenient for everyone. When defining those APIs, security gets thrown in as an afterthought.

A familiar API makes it look too easy to implement. If I give you a path of a file to open, you might just pass it to open(2) without any sanitization (look at how many web servers have had that problem). A convoluted API that doesn’t even let me express paths makes those obvious errors impossible (even if it makes subtle errors more likely).

And finally, what I think is the ultimate reason: security just isn’t much of a priority. It’s hard to justify software that gets in your way sometimes and doesn’t let you do anything new.

[View] [Short] [Hash] [Raw]


I solved it and now it’s boring

Last spring I wrote a high level overview in make decentralized. Since then I’ve come up with a fairly specific plan for decentralization. It’s not perfect and there are some missing pieces–like the way centralized companies use advertising to cement their position, and it’s not clear how a decentralized group would either pay for advertising or counter it with something better–but it covers what the system would be, how it would work, and how to build it. It even includes a business model[#].

But now that I think I finally understand it all, I can hardly bring myself to keep working on it. That isn’t to say I’m quitting, but we’ll see where it goes from here.

Anyway, here’s the plan in as much detail as I can muster the will to describe.

The basic approach is to build a “web-scale” distributed/decentralized database. I know “web-scale” gets thrown around a lot (especially making fun of MongoDB), but in this case I hope it would be justified, since the goal would be to store as much data as Google, Facebook and Wikipedia combined[#], and run parts of it by different people across the public internet. “If You Built the Right Kind of Database, Would the Web Become Decentralized?”[#] and a history[#] of other approaches.

The first step is to port SQLite to libkvstore (“a database for databases”). libkvstore isn’t even distributed yet, but this would provide an immediate benefit for people making heavy use of SQLite (because libkvstore supports LevelDB), create interest in libkvstore, and pave the way for decentralized SQLite. This is where I currently am, BTW.[#]

Next would be to add a decentralized back-end to libkvstore. The design of this back-end is built on some aspects of CAP and CRDTs that aren’t widely understood:

The event log would be completely external, provided via hooks. (The event log is the tricky part of distributed systems, and I wouldn’t even have to build it!) Each node would apply transactions from the event log to a recursive instance of libkvstore, backed by LevelDB or LMDB.

At this point it’s trivial to build some sort of blockchain-SQLite demo.

Perhaps the SQLite developers would be willing to start helping out, (and incidentally, Dr. Richard Hipp apparently lives like 2 hours from me). Maybe I could get them to help implement SQL-AP (which would mainly involve disabling certain features and adding error checks if you tried to use them).

Then we could go around open source software that uses SQLite and add decentralization features to them. For example, serverless syncing of your Firefox bookmarks (since Firefox already uses SQLite to store them). Transactions can be signed and/or encrypted for publishing if you’re worried.

Eventually, MySQL could also be ported to libkvstore and the holy grail of decentralized WordPress would be achieved with basically no additional effort. WordPress on SQL-AP wouldn’t be that unreasonable either.

Imagine libkvstore gets bundled in a future release of MariaDB and a new release of WordPress turns it on by default for every WordPress blog. Anyone can make a fully functional mirror of your blog, and if you have your public key you can update it from anywhere.

                StrongLink 2 Architecture Diagram
|                         StrongLink 2                          |
|             Content-addressable notetaking system             |
|      Scripting language?        |         "libquery"          |
|           (JS/Lua?)             |  (SQL and FTS alternative)  |
|  Embedded runtime  |  libasync  |           Schema            |
|   (Duktape/Lua?)   |   (HTTP)   |        (Details TBD)        |
|                    |            +-----------------------------+
|                    |            |         libkvstore          |
|                    |            |        (distributed)        |
|                    |            +----------+------------------+
|                    |                       |    libkvstore    |
|                    |                       | (LMDB / LevelDB) |

I also drew up an architecture diagram for StrongLink 2, but once we have decentralized libkvstore, StrongLink is just a notetaking app (with immutable entries and hash links). I hardly expect it to take over WordPress or Evernote (or even Notational Velocity).

(Note that StrongLink doesn’t need an event log because it only uses eventual consistency. libquery is a library for building search engines, somewhere between SQLite and Lucene. It’d be an enhanced version of the query engine already in StrongLink.)

So, that’s the plan.

Missing pieces:

Frankly, this plan should be doable by one person in 6 months of work. That makes it all the more frustrating to feel like I’m running out of steam.

For the record I started on the earliest versions of what would become StrongLink like 4 years ago.

[View] [Short] [Hash] [Raw]


I finally started trying to port SQLite to libkvstore[#]. I ended up starting from SQLightning, since I found its code easier to read (in hind sight, this should’ve been a red flag). I came up with a general plan of attack:

  1. Build and test SQLightning as-is
  2. Begin replacing calls to internal LMDB APIs with public APIs
  3. Move include "mdb.c" to end of file, include only public API at top
  4. Begin removing calls to non-libkvstore APIs (DBIs, dupsort, etc.)
  5. Add libkvstore to test harness build system
  6. Find and replace mdb_ with db_
  7. Test with libkvstore

The goal here was to keep the code working and testable at every possible stage of development. I think this is generally a good approach to large migrations of “legacy” code, although in this case there is a small enough amount of code that I haven’t really felt the need to follow these steps very closely (so far).

Anyway I feel kind of stuck. Check out this code from SQLightning:

static int BtreeCompare(const MDB_val *a, const MDB_val *b)
  UnpackedRecord *p = a[1].mv_data;
  return -sqlite3VdbeRecordCompare(b->mv_size, b->mv_data, p);

If you know how comparison functions usually work, you might think it’s a bit strange that the order of A and B are flipped in the call to sqlite3VdbeRecordCompare, and then the result is negated. Why not just pass them in the original order?

Well, it turns out that comparison functions in SQLite are asymmetrical: it expects the left-hand side in packed form, and the right-hand side in unpacked form. This was presumably considered an optimization at some dark point in the past. Usually, you’re comparing one specific record in memory (the needle) to a large number of records on disk (the haystack), so it’s tempting to decode the needle ahead of time to speed up comparisons.

As we’ve learned since (and is reflected in SQLite4), encodings for sorted keys should be carefully designed to sort lexicographically. That way, no special decoding is needed, making everything faster and more robust.

That brings us to the next weird part of the above code snippet. Arguments A and B are the individual records to compare. So what does a[1] mean? It’s not an array. If it had been a[0], I’d think it was just a funny way of dereferencing the pointer, but it’s reading beyond the (singular) element and probably getting random garbage from the database file.

But it turns out to be a hack. See, LMDB’s allows custom comparison functions, but it doesn’t let the user specify any contextual data. That means that the only data it has to go on is the two keys in question. Since SQLite keys (in this case) are packed according to the index’s schema, we can’t unpack or compare them without additional information. Without context from LMDB, there’s pretty much no way of accessing outside information.

Of course, SQLightning was written by one of the authors of LMDB. Instead of fixing LMDB to provide this necessary context, he took advantage of the full LMDB internals to control precisely when and how this comparison function is called, and then pass extra data surreptitiously after one of the keys.

MDB_val key[2], data;
char aSpace[150], *pFree = 0;
p = sqlite3VdbeAllocUnpackedRecord(pCur->pKeyInfo, aSpace, sizeof(aSpace), &pFree);
if (!p)
key[0].mv_size = nKey;
key[0].mv_data = (void *)pKey;
sqlite3VdbeRecordUnpack(pCur->pKeyInfo, (int)nKey, pKey, p);
key[1].mv_data = p;
rc = mdb_cursor_put(mc, key, &data, flag);

As you can see, everything is carefully arranged. The secret data is likely to even be kept on the stack, if it’s 150 bytes or less.

This works for LMDB, because SQLightning is (very) tightly coupled to it. However this doesn’t work for libkvstore, because we support various back-ends that can and should be able to compare keys whenever they please. In fact, LevelDB’s background compactions almost certainly means it does key comparison unpredictably on a second thread (which is kind of terrifying for trying to access any additional context).

Even with context available, you reach the problem of reentrancy: is it OK to compare keys based on state you read from the database itself? You might end up with arbitrary recursion. In order to compare A and B, read C; in order to find C, compare C and D, an so forth.

It’s also hard to get schema meta-data (known is KeyInfo) from SQLite. Database cursors are told the KeyInfo for their index, but if we open a database and start compacting it in the background, we might never have a cursor opened. Searching the source, there’s only one function that provides access to it:

KeyInfo *sqlite3IndexKeyinfo(Parse *pParse, Index *pIdx);

If you’ll let me rant briefly, this is bad code. It relies on state that it shouldn’t need and doesn’t really use (the Parse object is only there to store errors).

There’s been a renewed debate recently over whether breaking code into lots of little functions for “readability” is a good idea. Some people even say that huge functions (thousands of lines) are best when you are just performing a long list of steps. My perspective has been sharpened to a fine point: there is a correct answer, and that answer is that you can only create a function if it represents a logical unit of some kind. You can’t just break up code because it was getting too long, because then you need to decide on a function name, arguments, and a return value that all make sense.

In other words, the above function should only take a single argument, the Index, because that’s the only thing it logically depends on.

Even so, that’s pretty far afield, and even if we only needed an Index, it’s not obvious that we can get one, down in the bowels of the b-tree (and it’s obvious that we shouldn’t).

A final option is to unpack all keys as soon as they are handed to us (through a cursor, with appropriate KeyInfo), and then do our own serialization. Arguably that’s even a good idea, because it lets us decouple serialization formats from SQLite’s internals. It’s also a lot more work, since it involves greatly expanding our half-baked serialization layer.

[View] [Short] [Hash] [Raw]


Every once in a while, someone idly suggests, “why not just move beyond the Von Neumann architecture?” Then a food fight breaks out in the cafeteria.

Different aspects of the Von Neumann architecture are considered “broken” by different people, so of course the “fixes” look different as well.

One alternative is the Harvard architecture. The Von Neumann architecture defines a stored program computer with unified code and data. The Harvard architecture is the same, except with code and data kept separate. From a modern perspective, the advantage is basically security. Your program can operate on untrusted, malicious data, but even if your program is buggy, the computer won’t accidentally start executing malicious data as malicious code.

Modern processors are already hybrids, with features like memory protection to support W^X (write xor execute). There’s a fundamental limitation, because an interpreter can always turn data into code, so I think sandboxing is a more promising approach in the long run.

On the other hand, some people think the worst part of the Von Neumann architecture is the so-called “Von Neumann bottleneck,” which occurs because no matter how fast the computer is, it can only be doing one thing at a time. Modern computers (even smart phones!) typically have 2-4 cores, but that’s still a relatively small number. People expected the number of cores to grow exponentially after we hit the clock speed limit, but that hasn’t happened.

When people have asked why we don’t have massively parallel processors, they typically forget about GPUs. For a long time, graphics was the only demanding, embarrassingly parallel problem ordinary computer users had. Conversely, processors like the Mill and GreenArrays were basically oblivious to the graphics market. Now, in fits and starts, there’s finally been some progress towards GP-GPU (general purpose GPU) programming, and modern graphics cards are becoming much more flexible. That said, demand has been taking its sweet time to materialize, although there are some notable applications in AI and HFT (high frequency trading).

There are two lessons that I draw from this: 1. the Von Neumann architecture is amazingly useful and versatile, and 2. be very skeptical of new processor designs, because modern computers are very powerful, and the processor industry mostly seems to know where it’s headed.

[View] [Short] [Hash] [Raw]


I think there is a programming language paradox. But let me start at the beginning.

Programming can be divided into two categories: application programming, and system programming. The difference is basically how much the code matters. Software that is “wanted, not needed” or doesn’t matter much can be considered application programming. Software that is “mission-critical” or used by millions of people is system programing.

Now, there’s already a problem here, because the difference is how the software is used, and that can change over time. For example, Twitter and Facebook started as applications and became systems as people started depending on them. There are parts of Twitter and Facebook that are still “just” applications, usually the bells and whistles that see lots of churn. But the core products have a lot of people who get mad when something breaks, which has caused problems at both companies as they come to grips with the distinction between “move fast and break things” and “real engineering.”

More common examples of “systems” are operating systems, control software, and crypto. Some people like to add a third category for libraries, but I don’t think that is really necessary. Either the thing matters or it doesn’t. (But it’s also a spectrum, usually with a dollar value attached.)

Some developers make arguments like, “developer time is more valuable than runtime,” which indicates they are doing application development (whether they realize it or not), or that they think they are (ditto mark). For systems, the code is far more valuable than the time and effort taken to write it, and it’s worth making it bulletproof.

When you are doing application development, you need to know it and go about things in a different way. This is one good reason to use scripting languages. In fact, you should use whatever high level tools and frameworks let you get the work done as quickly as possible. I used to make fun of Python for being slow, but it’s actually completely justified in this scenario. Languages for application development should be as slow as possible(!) if it lets them offer any gains in development speed.

But that’s where the paradox comes in. Once a lot of people start using your high level scripting language for application development, the language itself becomes a system. I remember seeing comments from core developers at Apple explaining that any optimizations they could make to dynamic dispatch in Objective-C (which is really a scripting language bolted onto C) would speed up thousands of applications running across millions of devices. Even if the improvements are small, the aggregate impact is huge.

Likewise, there has been a lot of work on speeding up Python, simply because so many applications would benefit and it’s a lot easier than rewriting them all.

My third example is not exactly a language, but it’s close. Originally, SQLite was impressive not because of how well it worked but because it worked at all. Recently, the SQLite developers have put a lot of work into collecting as many micro-optimizations as possible, turning hundreds of sub-1% improvements into a general ~50% speedup. Of course if performance really mattered you wouldn’t use SQLite (or SQL) in the first place, but that’s not the point.

Suddenly discovering that your application is a system is always rough, as Twitter circa 2009 showed, but it’s even worse for programming languages because the design of the language itself changes as the focus shifts from development speed to execution speed. It also makes me reevaluate my rubric for choosing which language to use for a given project.

One conclusion is the truth of the advice, “write one to throw away.” Another is that you should use different languages for different parts of the same project, based on how important or reusable each part is (for example Python plus C).

This tension explains the tradeoffs that heavily optimized JavaScript runtimes like V8 try to make, although it doesn’t seem to offer any concrete suggestions. Perhaps advanced JITs make the best of a bad situation, or perhaps they push the language too far, at the cost of size and complexity.

This spectrum of importance is separate from the distinction between CPU- and I/O-bound workloads. If your extremely important web service has no way of taking full advantage of the available CPUs, you might as well use a scripting language. Although it’s usually possible to increase I/O bandwidth, the decision can become complex when you factor in ongoing maintenance versus hosting costs.

Which is one of the major reasons systems eventually get replaced. As they get more and more optimized, they calcify with complexity and further adaptation becomes impossible. The expected lifespan of the system is always something you need to keep in mind.

[View] [Short] [Hash] [Raw]


Text encodings are an illusion.

There are basically three stages of text encoding enlightenment:

  1. Text is a string of characters. (Ignorance)
  2. Text is a string of code points encoded by a set of rules according to… (blah blah blah)
  3. Text is a string of bytes.

Say you’re writing a text editor, which is one program that should probably handle encodings properly. The user tries to open a file as encoding X, but it contains a bunch of byte sequences that are invalid in that encoding. The file can’t be opened. Why aren’t you helping to open the file?

If your answer is “text encodings,” you might be a Replicant.

As any human would tell you, you open the file anyway. Display little blocks or diamonds for every byte that can’t be interpreted. Even if the user is trying to open binary data as 7-bit ASCII, deal with it. If they want to change the encoding later (reinterpret, not convert), do it without losing or corrupting any data. Text encodings are an illusion (or in other words, presentational).

Unfortunately a lot of modern languages have progressed out of stage 1 and into stage 2: Python 3, Swift, and Rust to name a few. Then they all go through contortions to support file system paths, which are bags of bytes like all the rest, but for some reason people realize that throwing an exception when trying to read a directory is a bad idea.

Approval ratings for Postel’s law are at historic lows:

Be liberal in what you accept, and strict in what you emit.

They point to HTML4 an an example of why “be liberal in what you accept” is a bad idea.

But we tried the opposite, “be strict in what you accept,” and it was much worse. It was called XHTML.

If we can’t be liberal and we can’t be strict, what can we do? Of course, HTML5 figured this out: judge ye not. There is no strict or liberal, there is only well-defined behavior. (As another example, I’d provide djb’s text0 format, which eliminates the chance of parse errors by taking advantage of C’s nul-terminated strings. That said it’s an ugly and unpopular format, and nul-terminated strings themselves are widely considered a mistake.)

Thus, an updated version of Postel’s law:

Accept everything in a uniform and standardized way.

It’s good if that way is exceedingly simple, like text0, but HTML5 works too.

Of course, this is still difficult to apply in security contexts, like HTTPS certificate validation errors or NaCl’s sandbox validator. If there’s an error, the system (and user) needs to assume they’re under attack, no matter how remote the possibility may seem. Which means it’s critical that these errors don’t happen by accident, which is a lesson HTTPS is still struggling to learn…

[View] [Short] [Hash] [Raw]


Following up on my previous post[#], is a decentralized SQLite possible and how is it different from normal SQLite? More precisely, is it possible to build an eventually consistent database that supports a useful subset of SQL?

Bedrock – Rock-solid distributed data (bedrockdb.com)

I recently saw this, which is a distributed database based on SQLite. It sounds pretty amazing.

Despite one of the comments by the author, it is strongly consistent (CP), not eventually consistent (AP). While it takes a bit of wind out of my sails, there’s still room to do interesting things.

Anyway, what are the limitations of SQL-AP?

In short, primary keys[#] must be changed to be hashes, UUIDs, or the like, and you can’t rely on deleting things. If there aren’t more limitations I’m missing, this actually sounds plausible for porting existing applications. Maybe even WordPress someday. (Incidentally BedrockDB claims to support a MySQL front-end, which is crazy-awesome enough that I need to look into it more.)

This has implications for all of the eventually consistent systems out there, from Cassandra to SSB, ZeroNet, IPFS and of course StrongLink.

(On reflection I’m fairly certain that his isn’t how ZeroNet’s SQLite support already works, although I might be wrong. I think it’s effectively single-writer, as enforced by a public key, a la IPNS and SSB.)

I’m fairly excited about the prospect of having SQLite on LevelDB with support for mixed logical/physical replication via Paxos, Raft, or blockchain, or in eventually consistent mode, but, for the record, I still haven’t written a single line of code.

[View] [Short] [Hash] [Raw]

StrongLink 2016-10-16


I developed a “general purpose” JSON-based CRDT[1] as a part of my content addressing system project. The idea was to have an “append-only tree”, where subtrees could be used by individual applications for different purposes, for example as counters or mutable sets.

In retrospect, it didn’t need to support recursion. A single-level append-only set is enough to be fully general and easier to perform indexing on. Using JSON was also overkill, since too much flexibility is bad for content-addressing.

[1] https://github.com/btrask/stronglink/blob/master/client/README.md#meta-files

Well, it’s only taken me three years but I’ve figured out how to design this properly.

The true nature of meta-files is a write transaction for a distributed database. (Only writes are stored since reads, of course, don’t need to be replicated.)

Since the core of a database is the key-value store, this “transaction file” only needs to record keys and values. However, even values are redundant (they can just be tacked onto the ends of keys), so all we need to store is the keys themselves.

For eventually consistent databases, we can only support insertions. The cool part is the same format works for strongly consistent databases too, if you just add support for deletions. (An update consists of an insert and a delete, which is safe since we’re in a transaction.)

Incidentally, this is almost exactly how my transactional wrapper for LevelDB works. While the transaction is in progress, it keeps a running list of inserts and deletes, and then when it’s time to commit, sends them all to LevelDB atomically.

Here is a mockup of the file format:


The digital signature of the transaction’s author comes first, so it can be used to verify the rest of the file.

Next comes the identity of the database. In the case of a relational database, this might be the name of the database, a random string, or a hash. That way a single replication stream (such as a blockchain) can support multiple databases without them stepping on each other’s toes. In the case of a document store like StrongLink, this would be the identity of the document (e.g. hash).

Following would be lines beginning with + or - indicating rows to be added and removed. Note that the equals sign is optional and databases must support the same “key” having multiple values. These lines must be sorted lexicographically, to minimize hash flexibility and to maximize processing speed.

Finally, there might be lines beginning with !. These would be high level commands for the database system, in order to support logical replication. For example, SQL statements like INSERT ... SELECT can produce huge numbers of rows, which makes them faster to replicate as commands, rather than as the data they generate. Because these commands can depend on pending transaction state, they can be interleaved with batches of normal inserts and deletes.

Of course, the problem with a text-based format like this is binary data. So in the real world you might want to use a binary-safe encoding.

I’ve been thinking about writing a distributed back-end for libkvstore. If I add an API for sending and performing custom logical replication commands, it might be pretty good. And it would be cool to support pluggable consistency algorithms, like Paxos or a blockchain. And if I ever get around to porting SQLite[#], that could be a quite a monster.

I can’t say how long it will take for these ideas to find their way into StrongLink, either.