HomeLog InSource


[View] [Short] [Hash] [Raw]

EarthFS 2014-09-14

Ask HN: Has anyone ever tried building an OS that doesn’t use files?

[…] I’m working on a project I’m calling “Library Transfer Protocol”, which is aiming to replace the concept of “file” and replacing it with ‘Library item’. […]

Hey, it’s that guy[#] again! I knew this sounded familiar.

Props that he’s still working on this project.

I don’t know about the file metaphor, but idea of a stream of bytes is timeless and essential. If you have that, you might as well call it a file. “Library items” (what, books?) are practically the same thing. In a filing cabinet or on a shelf, it doesn’t matter.

Realizing that EarthFS should use files (rather than “entries”) was a big step forward in its development. Credit to my friend Dan for his facial micro-expression (best term ever, lol) getting me on the right path.


So, you’re not building a filesystem (because that’s hard, and requires concrete engineering skills), but instead a glorified file metadata search?

So, MongoDB with a file URL? (Hint: that’s you could implement the MVP of this, and if you use a URL you can even reference user files they don’t store locally)

And file permissions are dead? Because nobody has kids that use the same desktop they do?

This is (from a technical standpoint) the silliest goddamn thing I’ve ever heard.

From a product standpoint, you could probably pitch and get a few M. Why the fuck not.

Yeah, this particularly vehement dismissal pissed me off too. Even if it’s not me or the LTP guy, I hope someone can prove him wrong.

I’ve said that content addressing could be done in 30 lines of Python, but for some reason I’ve been working on it for two years, and I know this guy has been working on his project for a long time too.

Incidentally, I’ve given up on file permissions in EarthFS too. There are user permissions but they are repo-wide.

So, you’re not building a filesystem (because that’s hard, and requires concrete engineering skills)

I want to respond to this but I don’t know how.

We don’t need another file system that’s the same as all the existing ones.

I hope I never become this jaded. And I know I’ve come close[#] at times. Sorry.


For your project make sure you plan what to do if one application created the item, but another program wants to open it.

Don’t try to do it by applications registering types they can open - this never succeeds, there are simply too many file types in the world.

Also think about how to send data to someone else.

And finally think about how to integrate with existing devices that still use files.

Wow. Let’s consider each of these.

EarthFS doesn’t have the concept of type handlers at all. So I don’t think this applies.

Sending data to someone else: check.

Integrating with existing devices (and software): it’s not ideal, but it should work.

In conclusion, that wasn’t as useful as I expected. I guess the answer to all three is “uses HTTP.”


There is an argument to be made for having better querying capabilities or permissions or whatever, but what is to be gained from throwing a commonly-accepted idiom away?

This response is much more measured and constructive.

[View] [Short] [Hash] [Raw]

EarthFS 2014-09-14


Oh shi–

I was a bit worried about how we were doing partial pulls, even after I got our “meta-file filters” working. It turns out that, at present, we have no way of knowing when to pull bare files (files without meta-files) at all.

In order to do that, we need to track the dependencies of each file. Luckily, we can just store that as a list of URIs in each meta-file, and there is no need to handle recursive dependencies between meta-files. Each meta-file must store all of its direct and indirect dependencies, and meta-files may not be dependency targets.

Then we have to enhance our pull system in order to guarantee the dependencies are stored before the meta-file that declares them. That is a little bit complicated if we’re still shooting for high throughput with lots of queuing, but it shouldn’t be too bad.

The end result is that meta-files have a well-defined order within a repository, but bare files do not. Seems reasonable.

I don’t like that we’re slowly slipping from linear order to a dependency graph, but partial pulls are too important to give up on (just like every other feature).

We’re probably going to blow our “end of September” self-imposed deadline, but that’s alright I guess.

One minor problem is that if a file has dependencies it doesn’t declare, it can appear to work until the file is pulled, and then it breaks. But dependencies can be updated after the fact, and broken dependencies can be automatically detected and fixed (if you have a checker for the file format).

[View] [Short] [Hash] [Raw]

EarthFS 2014-09-14


So I’m back to this Firefox extension.

After spending like a week banging my head against the wall, I’ve finally figured out how to do it… But now I’ve lost sight of what to do.

I can see three basic approaches for how to store web pages in EarthFS:

I thought the Internet Archive’s WARC format would be perfect for the container, but after looking at it I’m not so sure.

The fundamental question is what type of archives are we trying to produce? Do we want to archive what the server sends, or what the user sees?

Any browser-based site archiver is going to be limited by what the browser sees. If the user has a proxy that blocks ads, there is no way for the extension to store them. And even changes by other extensions can apply. So there are definite limits to what a tool like this can do.

Does that mean we should go all the way to the other side, and just record what the user sees? Or should we try to strike a balance?

And we have to parse the DOM either way in order to get the full text content.

The problem with WARC is that it stores its own headers and HTTP headers in the same file as the resource. We’d want to split them out so that raw resources get their own content address.

But at that point if we’re not 100% compatible with WARC then the address of the container won’t be consistent anyway.

Incidentally I didn’t find any existing Firefox extensions that support WARC. I suspect in part because there’s no way to get the completely raw HTTP headers.

Archiving “the page you see” made more sense when we had a button that did it at any given instant in time. But when we automatically archive any page as soon as the page is loaded, it makes less sense. We could still have a button to take snapshots.

I was thinking we were going to have to figure out compressed archive support, including replacing our simple URI list format with something more complex that could track meta-data of files within the archive without using meta-files, but now it looks like we can leave it for later.

So after yet another episode of decision panic, we’re back to the way we were doing it before…?

[View] [Short] [Hash] [Raw]

EarthFS 2014-09-11

Tarsnap: No heartbleed here

Man I wish we could use stunnel. It would solve the problem of supporting HTTPS in like two seconds, without a single possible chance of messing it up.

But we went through so much effort to bring the database in-process because we care about deployablility.

Also there’s the question of the scalability of stunnel’s server architecture. We didn’t go through all the trouble of writing our own async library for show. We did it because EarthFS opens one persistent connection and many short-lived connections for every pull configured.

Bi-directional pulls use twice as many connections. Even if we optimized that (which we might have to since it sounds like it’s going to kill home routers), one connection per remote repo is the minimum. If you think of a pull like subscribing to an RSS feed or following someone on Twitter, it’s clear that one person could easily have a thousand pulls at once. It’d be nice if that were possible without requiring servers in the middle to conglomerate pulls from different sources.

I wonder if simple HTTP proxies would work for handling all these pulls. But I think the answer is “not really” because proxies still use one connection per connection?

The long term solution is SPDY and HTTP2. It’s funny how we keep running into situations like this. We’re on the bleeding edge of what’s possible using off-the-shelf parts.

So then… Where is a library that handles both HTTPS and HTTP2? Can we embed nginx?

stud is a network proxy that terminates TLS/SSL connections and forwards the unencrypted traffic to some backend. It’s designed to handle 10s of thousands of connections efficiently on multicore machines. stud has very few features – it’s designed to be paired with an intelligent backend like haproxy or nginx.
Homepage: https://github.com/bumptech/stud


Maybe it wouldn’t actually be hard to bring something like that in-process even if it wasn’t designed for it.

Anyway, even with the additional memory overhead of TLS (like 2KB per connection, IIRC?), maybe we don’t need to worry about running out of memory with fibers. Everybody’s home router is going to keel over long before that.

It’s funny how everyone is flipping out about decentralized this and that, but deep down there are simple technical problems that make things a lot harder if not impossible. If you can’t open direct connections to everyone you want to talk to, complexity goes way, way up and privacy goes down (if you want real onion routing, run on top of Tor, don’t do some half-assed version mixed with your application protocol).

The real problem is no one understands or cares about latency?

Or we could try to replace pulls with pushes, so that the servers only open connections when they have something to send. But that has considerable problems too (all the sudden notifying clients becomes your responsibility, and you have to ensure no one is causing reflection attacks or whatever).

We need pushes anyway to deal with firewalls, NAT, mobile connections, etc. So we’ll have plenty of time to think about all the implications.

[View] [Short] [Hash] [Raw]

EarthFS 2014-09-05


Now I’m working on our query (formerly, “push”) system. After working out all of the logistics, I narrowed it down to being the perfect use case for POSIX condition variables. They are apparently the only synchronization option that supports timeouts, and we need broadcast, and we even need the mutex around it to ensure we don’t miss any events. Who knew this stuff was so well designed?

Unfortunately, we’re using fibers, so we can’t use pthread_cond_*.

Monitor (synchronization)

Despite being extremely long and providing sample code in excess (which normally I’m not a fan of) this is the perfect article for us. Even the parts that aren’t gratuitous detail seem like enough for us to build our own cooperative version.

In fact, it provides very simple code for implementing both conditions out of semaphores, and semaphores out of conditions. Then it says this:

Implementing monitors using semaphores is a bit more “roundabout” and somewhat less efficient than implementing semaphores using monitors. It is more common to implement semaphores either from monitors or perhaps from synchronization primitives like monitors.

Surprising because that’s not what I was expecting at all. I thought semaphores are so simple they should be the lowest level, and more complex things should be built out of them. But it makes sense that more complex things implemented at a lower level have more room for optimization.

I’m still not exactly happy that we built simple mutexes out of semaphores either, since we use mutexes all over the place and the overhead seems wasteful.

The only thing this article is missing is timed waits, but that’s more dependent on the timer system and should be an easy enough addition.

The other thing we could do is find an existing implementation. I know GNU Pth has one, although it’s probably GPL. I’m also curious whether our stack-based queues have been done before, and whether our performance is comparable to the state of the art or really bad.

Now the only synchronization primitive left that I’ve never needed to use is the barrier. I see what it’s good for, but it seems a little bit limiting… I don’t know. It’s the only one that you’re expected to use once and throw away?

I’ve only used the “once” check briefly before ripping it out. I don’t know how fast it is in the common case, but it seems like it’s never much effort to avoid it entirely.

I also figured out how to do dump and restore[#]. Basically, we have simple, separate tools that operate as clients over the EarthFS API. That way there’s no question of whether they’re saving files relative to the server or relative to the client, etc. They’d save the URI list as-is, and then store necessary extra information in some reasonable way (possibly just in the file system).

There are still some remaining problems though:

We also have to come up with a coherent plan for these external tools. It’s a bit of a pain to get our build system set up so we can use C for everything, unfortunately. Plus dump/restore would be useful for testing and benchmarking, so it’d be good to have them known to be fast and stable (compared to our custom HTTP library). They don’t have to talk to MDB directly or anything, so just about any language would work.

For the combination of performance and portability, Node.js actually seems like a good option in this case. Otherwise I’d do a shell script with curl.

Funny how all of the technologies I happen to know are coming in so handy (Objective-C, now Node). Partly I’m just working with the tools I have, but partly I think I’ve made some good choices and lucked into some good skillsets. But I’m worried using all of these skills in one project will result in something no one else can understand…

We could use nginx for the restore half, which would also be very good for benchmarking, but it doesn’t help with dumping. Plus I think Node is easier to install/set up than a server daemon.

I told my mom I was going to try to have something to release by the end of September. That’s actually conservative because I was really thinking two weeks. We’ll see though. This is the point where I start slacking off.

[View] [Short] [Hash] [Raw]

EarthFS 2014-09-04


Lately I’ve been coming across some Notational Velocity stuff again.

The first time I heard about NV, I was like probably 14. People were talking about it like it was the best thing ever, so I checked it out, but I didn’t get it at all. Of course.


I came up with the idea of making the search field huge at the top of the page, like on a Google results page or maybe even bigger. Several reasons, like search being a major feature of our system, but a major one is people are supposed to put in hash URIs, which are often long, so they need a big search field to do it.

I felt like a genius when I came up with this idea, but I had forgotten that NV had this user interface for ages before us. So that convinced me of it’s extreme relevance and now I’m looking into it again.

An interview with Notational Velocity developer Zachary Schneirov

Wow, remember this?

All existing apps either tried to place a multi-document drawer on TextEdit with a slew of buttons, made you file information into discrete fields or categories, had very imprecise searching, […]

This comment reflects the strength of NV’s underlying model or ideology. EarthFS has a slightly different model, which IMHO is even stronger (as you’d hope for a project 10 years later). We consider notes to be just files, and we have the best form of linking (I’ve come a long way to be able to assert that now), and we dropped the concept of “modeless,” which produced a very nice UI but didn’t really pull its weight. We added Dave Winer’s concept of River of News, which I think fits very well with search.

[…] or weren’t secure enough for storing passwords (the encrypted database was mandatory in the first version).

NV seems to have moved away from this over time, and EarthFS doesn’t try to support it at all. Either use a dedicated password manager, or encrypt individual files in EarthFS at the application layer.

I guess given some bloggers’ early promotion and consequent direction of its development (I’m very grateful to Giles Turnbull as well as Merlin), it makes sense that they would continue to see value in NV as a general-purpose writing app. So the fact that a note-taking program / password manager also happened to appeal to bloggers almost certainly helped spread the word.

EarthFS fits this niche very well and actually goes further by turning notetaking+sync and blogging into the same problem.

I’d love to see a program like one of those that accepts tons of filetypes, but still keeps everything organized in folders, not some giant database.


  • Syndie and other apps built on I2P
  • Freenet
  • GNUNet
  • RetroShare
  • Camlistore
  • TeleHash
  • MogileFS
  • CouchDB (Currently used on the desktop as part of Ubuntu One)
  • YaCy

Why isn’t EarthFS built on any of these? I guess I have to take a look at all of them.

EarthFS isn’t built on any of them because content addressing is simple and fundamental, and an application that implements it should be simple and fundamental too.

Apple’s SearchKit and Spotlight frameworks have high latency, search on per-word boundaries, and are unreliable at returning and ranking results. Unfortunately most mainstream note-taking applications seem to rely on frameworks like these for finding information across notes. NV, however, has its own incremental filtering algorithm which never searches the same part of a note more than once and can find text at any location relative to a word. And its brute-force approach makes it quite predictable — there’s no “intelligence” to second-guess the user. I tuned it for faster than realtime performance with 1000 notes on a 500MHz G3 and it hasn’t been a problem since.

This is very interesting given we just finally got our completely custom search system basically working and extremely fast. We’re still limited to word boundaries (no prefix/suffix matching, although we have stemming). Phrase search is hopefully coming.

I’m curious about what “incremental filtering” and “never searches the same part of a note more than once” mean.

I wonder how long his average note is? He keeps them small, so maybe 100 bytes? So 1000 notes is 100KB, which is trivial to keep in memory and probably no trouble for linear search.

When I was reading this interview originally, I was at 5000 notes. Now we’re up to 8000. Slowing down a bit, but that’ll change when we start storing other files and (hopefully) pulling from other people.

I should check how many files are in my collection of saved web pages.

$ find . -maxdepth 2 -type f | wc -l
$ find . | wc -l

That’s just the pages on my laptop, going back to 2014-06-02, so three months. Only around 1000 pages per month seems like less than I expected, actually.

Still, dumping 50,000 files on EarthFS is going to be fun. Hopefully we’re up for the task.

Remember though, we currently have around 16,000 files when counting meta-files. And most of the files in a web page wouldn’t need meta-files. So it’s not actually as bad as it sounds.

Maintaining an encrypted database of notes was one of the main reasons I built Notational Velocity. And though CoreData makes it incredibly easy to persist an “object-graph” to disk, to this day there’s no way add a layer of encryption beneath it. So NV serializes all note-data to memory, compresses it, and then encrypts it before writing it all out in a single atomic operation that’s protected by the HFS+ metadata journal. And to handle incremental updates (i.e., auto-saving every few seconds), it uses its own incrementally compressed, encrypted write-ahead log.

I wonder if SQLite could’ve worked.

Yes, it could’ve, although the encrypted SQLite VFS isn’t free.

NV is open source, so we can examine its database format if we care enough. It sounds interesting at least.

Though I’ve tried to clarify the app’s purpose and design as much as possible on the web site, I seem to have managed to cultivate quite a varied user-base. So I now find myself spending most of my NV-time maintaining features that just aren’t as relevant to me personally. And those things I do want to work on are continually getting pushed back.

I think we’re in a better position on this. We have a broader and more practical core purpose, and we’re leaving a lot of room for extensibility.

I overviewed all of the notetaking projects I could find on GitHub too… That was a really good idea. A ton of them were inspired by NV. But the main thing I got out of it was that content addressing for notetaking is badly needed.

[View] [Short] [Hash] [Raw]

EarthFS 2014-09-03


So… That went smoothly.

We now have an extremely fast filter system based on MDB and written in Objective-C.


Yes, Objective-C. I know this is going to alienate my audience, and it’s hard to overstate how much of a drawback that is. However, we are careful to keep it completely isolated (one single file). We use the bare minimum, the libobjc runtime, but none of the frameworks or “standard library.” It’s literally “C with objects.” It should compile with GCC and Clang on pretty much every platform including Windows. And it was the only practical approach to managing the complexity of our filter hierarchy.

It’s funny, everyone uses SQL but complains about how slow Objective-C is. At least in this scenario, SQL was a bottleneck but the method dispatch overhead is trivial.

Of course, we’ll have to figure out a more CYA-appropriate long term solution. Possible options:

Now that we’re using ObjC, switching off it is going to be a little regrettable because I think Objective-C is an amazing, under-appreciated language[#]. However, EarthFS is not the venue for pushing my personal agenda and I care more about this project than I do language flamewars.

None of the options above seem like much of an improvement, so we’ll probably stick with ObjC for as long as possible, until the social pressure comes down on us like a ton of bricks, or at least until we’re almost ready to release and I start having second thoughts.

I think I’ve mostly figured out the permission system. File ownership will be a component of file identity, so multiple copies belonging to different users will get unique file IDs. They’ll still be deduplicated by hash, of course (and hopefully that won’t open up any timing attacks, but I don’t think it will). All of our indexes (full-text, meta-data, etc) will be prefixed by user ID. Basically the whole database will be sharded by user with very little overlap.

Public access is an open question.

We ended up encountering more “bugs” (basically API oversights, especially regarding MDB_DUPSORT) in MDB. This is what people mean by “stability” of a platform (e.g. compared with SQLite). The documentation often left details unspecified and I had to examine the code quite a bit (which is ugly but utilitarian). Thankfully it was fairly trivial to wrap its API and work around the inconveniences.

Next steps:

[View] [Short] [Hash] [Raw]

EarthFS 2014-08-30


  1. Pre-computing these denormalized mappings is pointless and unnecessary. It’d be faster and easier to do a union of just the matching meta-files as the filter is run. We need to check the “age” of each match either way, which is arguably the slow part.
  2. Right now, all of our filters are pretty much always worst case because we have the special visibility filter that gets added in automatically. We can simplify the whole system by making visibility determined by the presence of meta-files, which should also make all common filters much faster. That gets rid of the special efs://user link.
  3. We have to decide once and for all whether we’re going with SQLite or MDB. My mom was kind enough to spend like an hour and a half on the phone with me working through the pros and cons of each. She gave the edge to MDB for its performance, and I’m inclined to agree. If it turns out we have to compute the “match age” of each (potential) result, it seems like the overhead of point queries in SQL is unacceptable. Once SQLite4 is out, we can adopt its low-level interface so that we can switch easily use any database back-end that SQLite supports.
[View] [Short] [Hash] [Raw]

EarthFS 2014-08-30



Your takeaway reminds me of this quote from John Carmack:

“Focused, hard work is the real key to success. Keep your eyes on the goal, and just keep taking the next step towards completing it. If you aren’t sure which way to do something, do it both ways and see which works better.”

So I tried switching back to SQLite. I ended up concluding that further optimization of this filter system is extremely hairy if not impossible.

In order for the simple sort-merge intersection to work, we’d perform the following set of steps:

  1. Start with a particular query term
  2. Get the list of meta-files that define that term
  3. Get the list of files those meta-files point at
  4. Get the list of all meta-files targeting those files

The expanded list of meta-files is the sum of “merge points” where a file can appear, depending on other terms in the query.

The problem is that getting this denormalized list of meta-files in sorted order is difficult. When a new meta-file is submitted for an existing file, we have to copy over all of the file’s existing attributes to the new {meta-file, file} pair. And I don’t think there’s any reasonable way to do that with SQLite FTS.

So we might be switching back to MDB.

[View] [Short] [Hash] [Raw]

EarthFS 2014-08-29


Then finally I realized I should import everything into a regular relational database and play with the SQL until I figure it out.

I think I’m going to go back to SQLite.

There’s no way to dump our current ad-hoc storage system into a relational database because there is no clear mapping between the two. We just have a bunch of floating b-trees, some of which might be considered tables and some which might be indices.

In order to get our data in SQL format, I checked out the last version of the codebase before we switched to MDB. It turns out that its pulls start out somewhat slower, but don’t decay as much. I’m guessing that’s because of the work FTS goes through to batch its writes. That isn’t a knock on MDB, and we might end up using SQLightning to get the best of both worlds.

The other problem is that every key-value store has its own special snowflake API. Porting to MDB was fine, and its API is mostly very good, but I could never bring myself to port it again. On the other hand:

Log-structured merge-tree

LSM trees are used in database management systems such as HBase, LevelDB, SQLite4, RocksDB and Apache Cassandra.

SQLite 4 is going to have modular back-ends, and there are already several chop-shop versions of SQLite 3.

Now, that’s a bit frustrating, because if I had to choose a unified key-value store interface, I wouldn’t choose SQL. But we gotta work with what we got.

Although we could stick with MDB for now, and then pick up the raw SQLite 4 back-end interface once back-ends start being written for it. But that’s a long way off, and we still wouldn’t get compatibility with SQLite tools (like the command line interface).

A big factor in our switch to MDB was the overhead of running zillions of tiny queries. When each query returns one row (or zero), the overhead of the VDBE becomes apparent. However, our current filter system takes maybe 15 ms in MDB, and only 30-40 ms in SQLite. That’s not as much of an improvement as I would’ve hoped for. Plus, we don’t have to cache queries, but we do have to cache cursors, which is even worse.

And finally, we’re trying to rewrite our filters to avoid doing all of these point lookups, which should tip the scales further.

In conclusion, I have now relieved the entire history of the database up to the modern day, and I have a much better understanding of why things work the way they do. Sort of a waste of time, but I never could’ve learned without learning it the hard way.