HomeLog InSource


[View] [Short] [Hash] [Raw]


To the Real Person on the Other Side of This Screen

One day I was tooling along at the bottom of the sea, overturning small rocks in search of interesting articles. In one article I hovered over a link and saw a familiar URL appear. A very familiar URL–my own blog, in fact. I was mildly alarmed.

In this case, things seem to have worked out alright. Let me say thank you to angersock, friendlysock, and everyone else whose posts have taught me so much over the decades.

However, this little episode could’ve been a drama if not a tragedy, if whatever thing I had written 3 years ago had been a little bit ruder, or Chris (whom I’ve never directly talked to) had been a little bit less reflective. In fact, let me also do a little bit of pre-emptive damage control by saying my blog is a record of how stupid I was at any given point in the past. That includes as recently as late this morning.

Perhaps the reason things worked out this time is that when I wrote some personal notes 3 years ago, which I later ended up posting on my blog, I managed to remember that the person I was writing about was a real person who, however unlikely, might actually end up reading what I wrote.

It turns out that the internet is extremely small, and if you write about someone they’re fairly likely to actually see it, sooner or later. In this case, it’s happened three times already: I saw angersock’s post about a project merely similar to the one I was working on; he saw mine about his comments; I saw his about mine. And now I assume he’ll see this one.

I have a theory that the reason people often become jerks on the internet (which I have done myself) is due to the sense of powerlessness that stems from feeling like nobody reads or listens to what one writes. It can become a vicious cycle: no one listens, so we become ruder. We become ruder, so no one listens. This sense can be reinforced not because the specific person we replied to didn’t listen, but because it’s all too easy to see the whole internet as a single person who never listens to us no matter how much we shout. We get into a pattern of saying the same things, over and over, because the internet seems to never learn, when in fact we’re getting ruder and ruder to a steady stream of new people.

Now, this theory may or may not be true. But nobody tries to become a Bastard Operator From Hell. It’s something we just slip into, usually without noticing. We must remember, or merely hope, that someone is listening: maybe the author, maybe a lurker, maybe someone in the distant future.

From the aforementioned article, there is something I’d like to highlight:

One of the other rather odd things I noticed was that, as I started putting those principles into practice, I became sensitive to posts by other users that didn’t follow these practices.

I found myself keying off immediately on people being overly negative or starting a reply off with some kind of grumpiness or just plain being impolite. Part of me wondered (wonders?) if that’s how folks read my own work.

Lately my thoughts about posting on Snacker News (sometimes called Yakker News) have been about making the site itself as respectable and welcoming to experts as possible.

HN is remarkable for the famous and successful (IRL) people in computing who sometimes deign to grace its pages with their comments. Often these comments add a big dose of real-world experience from someone who’s been there. Sometimes they’re grumpy or low effort, just like anyone else’s.

What concerns me is that people who are famous and successful (IRL) have no reason to put up with “internet randos” questioning their experience or giving them shit. In fact, no one has a reason to put up with that, but with some notable people, the loss is more obvious (if not necessarily greater).

It made me think of developmental stages of internet posting. Motivations for posting might be broken down into these categories:

  1. “what I think”
  2. “what will convince people I’m right”
  3. “what will make me look good”
  4. “what will make the site look attractive (intelligent and balanced) to other smart people”

I want to be clear that motivations 1-3 are not necessarily bad. However, I think they should all be tempered with a healthy helping of motivation #4. After all, if you post on an online forum, you probably also read it. Making the forum appealing to other smart people is in your best interest (unless you’re insecure, in which case work on dropping the ego first).

When you post a comment online, imagine that someone you respect will read it (which you should want!). Because they might.

I have some small suggestions to smooth out common interaction problems, particularly on threaded forums with voting (like Flobsters, HN or Reddit):

Comments being cynical or negative is actually normal and fine, up to a point. But keep in mind is that there is a real person who wrote the article or comment that you are replying to, and who is very likely to see your reply. If you’re going to publicly reject something someone else posted, please give them something constructive to go on.

Just to be clear, no, the internet does not need to be a giant hugbox. Personally I have a relatively thick skin (against rude individuals, not thermonuclear hate mobs), so I have the privilege of being able to tolerate assholes. I actually consider it part of my competitive strategy: I will make an effort to put up with almost anyone as long as I can learn from them. But I also recognize that others can’t or won’t do that, so it’s in my interest as much as everyone else’s if we can all be a little bit nicer to each other. (This can also be thought of as an example of Postel’s law.)

As the internet gets bigger and older, the culture seems to be fracturing into multiple groups who don’t like each other very much. I’m not sure we can do much about that, but I have a dream of permissionless collaboration, which to me means that even people who hate each other should be able to build on each other’s work. Sometimes I worry that if Einstein had been a bigger asshole, we’d still be stuck on Newtonian mechanics.

As an aside, I would highly recommend everyone read the n-gate.com Hacker News digest. It’s a little bit jaded (ha), but it can be helpful to have a mirror held up to one’s community every once in a while.

I’d like to leave you with a concerning comment from a very interesting thread I saw recently:


It feels like intelligent and mentally healthy individuals have been opting to abstain from online discussions in general over the last few years in droves after realizing their time and energy and better spent elsewhere. Especially with the trend of comments being taken out of context to attack individuals employers/livelihood. That leaves the young, socially broken and depressed (myself included).

The incentives for posting online are very small. Most of us do it to have fun, or learn things, or even make friends. If someone lashes out at you, or downvotes you when you were trying to be constructive, the ROI can easily go negative. I’m afraid of what it would be like if we entered a “cultural ice age.” Perhaps in the future we’ll have to go back and read old archives from the golden age of online forums.

I think that would be really unfortunate.

Previously, a guide to posting on internet forums[#].

Keywords: internet forum culture

[View] [Short] [Hash] [Raw]


Content-Addressable Storage versus Eventually Consistent Databases

Content addressing is, as far as I know, the best way to build an eventually consistent database. But it’s become apparent to me that there is actually not a lot of overlap between the two concepts beyond that.

My project StrongLink tries to be both. It provides content-addressable storage for files, and lets you find files by hash URI. It also tries to track file meta-data using what I called “meta-files”, which are files that store meta-data about other files.

This split personality ended up majorly over-complicating everything. Per-file meta-data ended up being a per-file eventually consistent database, with more complexity and less generality than a single large database would’ve been. Syncing was especially confusing, because you can use the meta-data in order to decide what files to sync (you basically end up needing two separate sync algorithms).

There are other differences and tradeoffs between content-addressable storage and eventually consistent databases. Content-addressable storage deals in files, which are likely to be large and mostly or entirely redundant, even during normal use. An eventually consistent database deals in commits or transactions, which are more likely to be small and rarely if ever redundant, except when healing after a network partition. A database needs to parse transactions into indexes; a storage system may need to break files into chunks (although I still maintain this is bad if your users rely on your hashes as part of your public interface, especially if they must be compatible between different storage systems).

If you want an eventually consistent database, building it on top of a general-purpose content-addressing system might not be the best fit, unfortunately. And if you want a pure content-addressing system, especially for performance, an eventually consistent database will be both overkill and slow.

It’s quite possible that this split simply mirrors the traditional dichotomy between file systems and databases. One is fast, dumb, and wide; the other is slow, clever, and deep. Don’t mix them up like WinFS did.

Keywords: tradeoffs

[View] [Short] [Hash] [Raw]


Richard Feynman and the Isle of Maths (Plural)

It was 1952, still only a few years after the War, when Dr. Richard Feynman, esteemed professor of theoretical physics at Caltech, boarded a small passenger plane in Brazil, ultimately bound for the United States. His penultimate destination was some patch of ocean in the Gulf of Mexico, where the plane crashed.

The good doctor woke up on a beach. He found his clothes in tatters, and himself somewhat dehydrated (he could use a stiff drink), but he was otherwise no worse for wear. He took off what was left of his shirt, revealing his upper body, toned from years of intense physical calculations. (Unfortunately, his trousers were in one piece.)

Taking immediate stock of the situation, he decided to start exploring the island (which he knew it to be from the title). After walking along the beach for less than 15 minutes (judging by the sun), he almost tripped over a bottle of vodka, conveniently chilled by some damp, shaded sand. Picking it up, his spirit was further lifted when he spotted a grove of coconut trees, bearing countless ripe coconuts. A mixed drink with coconut water would make exploration of this island quite pleasant indeed.

Approaching the grove, he rolled up his pant legs and quickly scaled a coconut tree, like any skilled professor would. He easily dislodged two large coconuts. Looking down to see where they had landed, he noticed three coconuts in a triangle (for no other shape was possible) on the ground. He went about harvesting two more, and again checked his work. This time he was surprised to see six coconuts on the ground (in what he knew technically as a jumble).

He climbed down and examined the pile of coconuts. Taking two from the pile of six, he noticed that there were only three left. Putting them back, there were six total. Dividing the pile in half, there were two piles of three. However, picking up a coconut from one of the piles, the pile only had one coconut left. In other words, by his best professional assessment of these coconuts, it appeared that 1 + 1 = 3.

Leaving aside the coconuts for a moment, he downed a shot of vodka. (However, trust me, your author, when I insist this did not impair his impeccable judgment in the slightest.)

He decided he would come back to the coconuts later. Suddenly he spotted a stream emerging from the forest, further down the beach, and running into the ocean. Upon reaching it, he saw that there was a clear path along it into the woods, and wondered if he might find other inhabitants of this island.

Following the stream for about half an hour (judging by the number of paces at his standard walking speed, because the sun was hidden behind the trees), he eventually came to a large, round pool. His keen sense of vision as a theoretical physicist told him the pool was perfectly round and precisely 100 meters across. He also noticed that the surface of the pool was as smooth as glass (atomically).

Thinking nothing of it, he decided to semi-circumnavigate the pool and continue up the stream, which resumed on the exact opposite side. He walked precisely along the shore, with one foot in the water (because it was a hot day). However, upon reaching the other end of the 1-dimensional stream, he was surprised to notice that his path along half the pool’s perimeter, which should have been 157.08 meters long (approximately), had actually been 150.00 meters long (precisely).

To confirm his hypothesis, he walked around the other half of the pool, and firmly established that its total circumference was precisely 300 meters. In other words, it seemed that at least when pertaining to this pool, the value of pi was exactly 3.

He downed another shot from the bottle of vodka he was still carrying. It was starting to get dark, so he decided to head back to the beach and try to start a fire to signal for help. On the way back to the beach, he found some logs conveniently cut for that purpose.

Dropping about 100 kilograms of logs in a suitable spot on the beach, he set about building a fire. First, he took some dry logs and arranged them in an optimal conflagration configuration. Then he added some dry leaves as kindling. Then he pulled some flint out of his pocket (a physicist is always prepared) and struck it with a rock (from another pocket). Sparks flew onto the dry leaves but nothing happened.

Dismayed, he tried again. More and bigger sparks flew onto the leaves, but disappeared without leaving the slightest burns. He struck the flint again and again, shooting giant balls of fire at the leaves and logs. But there was no hint of ignition.

After hundreds or even thousands of attempts, he finally messed up once. The rock weakly glanced off the flint, making no sparks. Then, slowly, the leaves began to burn. He stepped back as the fire grew.

Surprised at this occurrence, and mindful of the previous events in his short time on this curious isle, he devised some simple experiments. He quickly concluded that there were three basic conditions to start a fire: dry logs, dry kindling, and striking the flint to produce sparks. However, the fire would only start when either the logs were wet, the kindling was wet, or no sparks were produced. In other words, true and true and true was false, but true and true and false was true.

Richard Feynman swallowed a third shot of vodka and cast the empty bottle into the water. The thought of rescue faded from his mind. Sitting down on the beach, he made his plans for the next day. He would investigate the Arithmetic of the Coconuts, the Geometry of the Pool, and the Logic of the Fire, and get to the bottom of the mysteries of this island.

The End (because I’m not smart enough to write the rest)

Author’s notes

I’ve long been curious about the idea of whether math and logic could vary. If we can’t rely on logic, then Descartes’ “cogito ergo sum” is wrong. Without logic, we can’t conclude “cogito,” much less “cogito ergo” anything.

It’s easy to imagine alternate universes with different physics, for example a different strength of gravity, or different subatomic particles. However it’s hard to imagine a universe with different math. Already in this story, there are inconsistencies: two groups of three coconuts should add to become three groups of three coconuts; in reality, the conditions to start a fire are not firmly defined, making it hard to know when exactly one of them is false.

My interpretation is that unlike physics, math as we know it is true in every possible universe. The evidence for this is four-fold:

  1. The difficulty in working out a counter-factual but internally consistent form of math
  2. The ease with which such a counter-factual math might let you solve hard problems (imagine a universe where the halting problem is trivial to decide in constant time)
  3. The difficulty of building computer simulations within our universe that don’t inherit our math (despite the ease of not inheriting our physics)
  4. Not all mathematical operations are local (if you own a house, and buy a vacation home in the Bahamas, do you suddenly get a third house? where?)

My conclusion is that physics isn’t physics, math is physics. Beyond that I’m at a loss.

Thank you for reading!

Keywords: fiction, math

[View] [Short] [Hash] [Raw]


Can and should an x86 sandbox run unmodified x86 code?

Well, it’s complicated, and there are more tradeoffs and unsolved problems than one might expect.

The first argument is that yes, running unmodified x86 (and x86-64) is a very good idea because it ensures that the sandbox is modular with respect to the software running in it. In other words x86 is a very solid interface, and conforming to it makes your sandbox a drop-in replacement for anything else (including no sandbox at all).

The trouble, of course, comes from the fact that x86 is notoriously hard to virtualize. High performance and low overhead are necessary for making sandboxing practical and popular.

There are basically for approaches that I know of in use so far:

(Technically, there are a couple more options, like a sandboxed language such as Javascript, or syscall blocking like seccomp, but I think they are too limited so they won’t be discussed here.)

The first two run unmodified x86 code; the second two basically need a special compiler target.

Hardware virtualization is obviously the fastest. It can theoretically be full speed, depending on how powerful the feature is. (Of course in the extreme case, you have two separate, air-gapped computers. But let’s put that aside. We’re focusing on solutions for a single computer.)

The problem with using hardware virtualization is that we want our sandbox to be robust against CPU bugs. What does that mean? It means two things:

  1. It means that if a CPU bug is found in some instruction (somewhat poor example: Rowhammer being exploitable via CLFLUSH), we want to be able to rapidly and easily modify our sandbox to block/avoid that instruction.
  2. It means that if you are very paranoid, you should be able to choose a Turing-complete subset of instructions to trust, rather than trusting all of them, based on the security-vs-performance tradeoffs you are willing to make.

Thus, relying too much on hardware virtualization is risky, because you have no defense/recourse in case a hardware bug is found. (Note that I am not talking about malicious hardware backdoors; those are a separate problem that sandboxing cannot hopefully defend against.)

For this reason, something like VT-x isn’t very useful for a secure sandbox. The real, hard problem of doing it in software has to be addressed.

Now, I want to explain two different ways of doing secure sandboxing in software:

In my opinion, the choice here is clear. A formally proven compiler is very difficult and expensive to build, and then difficult and expensive to maintain, forever. The chance of error, no matter how low (thanks to formal proofs), still grows linearly with the amount of maintenance or new development over time. In other words, this option is completely undesirable.

The other option is a secure verifier. If you come up with simple rules for your secure x86 subset, then the verifier can be very simple. There are NaCl verifiers that are 500 lines long and formally proven. This simplicity also lets you easily change them to work around CPU bugs. Besides that they never really need to change at all.

What I am saying here is that an x86 sandbox will need a secure verifier for x86 code regardless of what program format it accepts. This verifier should be the absolute last stage of sandbox execution, before the sandboxed code is run on hardware.

(This is trickier if you want to run self-modifying code. In theory, if your sandbox uses W^X (write xor execute) memory pages, then you can run the verifier when pages are about to become executable. Of course you want to be sure that the verifier is run in all necessary cases, and perhaps you don’t want to trust the hardware memory protection.)

At this point, the security portion of the sandbox is done. The only question left is the interface (execution format).

The simplest option is just to leave it as an instruction subset. This is what NaCl does. However, the fact that NaCl isn’t wildly popular suggests why this might be a bad option.

The biggest problem with this approach is that you are effectively exposing the internal implementation of the sandbox. If there’s a different sandbox that uses a slightly different subset, or if you need to change which instructions are permitted, then all applications are affected. (On the other hand, this option has the best possible performance.)

A new instruction set is theoretically even worse for running existing software, but in practice it can be better. The main selling point of WebAssembly is cross-platform portability, which is unrelated to its security properties.

The final option is x86 dynamic recompliation. As I understand it, VMWare basically became a $40 billion business just by doing this one thing first and well. Then hardware support came along and ate their lunch (to some extent).

As far as I know, VMWare’s recompiler was never “highly secure” (i.e. it was written using conventional development techniques), nor did it have a formally proven output verifier. (To be clear, you don’t need both.) In other words, x86 support for a secure sandbox is (somewhat) harder than the one VMWare was founded to solve, and moved off of as quickly as an alternative was found. There is also an implicit performance hit on top of the instruction subset cost.

In conclusion, no, a secure sandbox cannot (directly) run unmodified x86 code. However, it probably should, for the sake of adoption. A WebAssembly front-end might be a viable compromise thanks to an easier implementation and reduced performance expectations relative to x86.

Keywords: sandboxing, security

[View] [Short] [Hash] [Raw]


Sandboxing is not Sanitization

I’m a big proponent of sandboxing. I think it will come close to solving computer security, and prove that all the people who laughed at Neil deGrasse Tyson (for saying “just build secure computers”) are fools.

However, like all sufficiently advanced technology, it isn’t magic. I want to share a lesson about it that recently crystallized for me, thanks to a certain HN thread.

Microsoft didn’t sandbox Windows Defender, so I did (trailofbits.com)

Wow, someone isolating large, complex and buggy parsers with sandboxing! Great!

Then I get down near the bottom of the thread, and I see this:


If the sandboxed process is compromised, all you can do is read a file that you already had access to (because it’s your exploit), and lie about the scan result. That is not terribly exciting.

I already know how it works, so why am I reading this? Oh, wait… Huh.

This is an anti-virus program. It’s designed to protect your PC from viruses. If a virus can trigger a bug in the virus scanner (and remember, it’s large, complex and buggy – that’s why we sandboxed it), it can lie about whether a file is infected.

Then, we can only assume, the virus gets parsed or executed in a privileged context and takes over your PC.

Hmm. That didn’t quite work as planned, did it?

All else being equal, a sandbox with an untrusted input will have an untrusted output. That’s just the way it works. A sandbox can constrain the output, but it can’t guarantee any specific qualities about the output’s nature.

The classic case for sandboxes are things like PDF viewers, image decoders and Flash player. These are all things with predominantly human-centric I/O. In other words, a bad JPEG can produce an ugly, misleading (or large) bitmap image, but from that point all you can really do is social engineering (basilisks aside).

On the other hand, generic file parsers that we all want to sandbox typically have output that is read by another program. That output might be in a structured format that the second (possibly trusted) program then parses. If that second parser has a vulnerability, you’ll notice we’re back to square one.

A real-world example of this is the QubesOS firewall VM. Qubes comes configured to run a separate instance of Linux as a firewall in front of your other VMs. However, both the firewall and the other VMs are probably running from the same Linux image (template), and are almost certainly running the same TCP stack. In other words, the firewall itself isn’t much safer than the VMs it’s supposed to protect, and once the firewall is compromised, the same exploit can probably be used to compromise the inner VMs. (Disclaimer: this configuration might’ve changed in the year or so since I last checked. One easy improvement would be to use a different, smaller OS like OpenBSD as the firewall.)

For lack of a better term, let’s call this the dirty dataflow problem. Dirty (untrusted) data flows into your sandbox. Then it flows out, into another sandbox (or worse, not a sandbox). As long as the trust level of the target is at least as low as the sandbox itself, this is fine. However, if you are expecting sandboxing to help you get data from a low-trust area to a higher-trust area, you’re fooling yourself. Shit runs downhill.

Again, don’t get me wrong, I think the coarse-grained security that sandboxing provides is just what the doctor ordered for making most software mostly secure quickly and cheaply. However, when you actually want to sanitize your inputs, fine-grained security (through secure languages, tooling, runtimes, or formal proofs) is necessary.

[View] [Short] [Hash] [Raw]


The Paradox of World Domination

The paradox of world domination goes like this: suppose that there are some people hell-bent on ruling the world with an iron fist. Also suppose there are other people who don’t think the world should be ruled by such a dictator, and that instead rights and liberties should be passed down to countries and individuals.

The trouble is this: in order to prevent the pro-world domination faction from installing their world dictator, doesn’t the anti-world domination faction have to install their own dictator? After all, don’t we need to execute all of those potential dictators and prevent them from taking control… with our own iron fist?

Now I submit that this idea is analogous to the paradox of tolerance. Wikipedia describes it thus:

Paradox of tolerance

The paradox states that if a society is tolerant without limit, their ability to be tolerant will eventually be seized or destroyed by the intolerant. [Karl] Popper came to the seemingly paradoxical conclusion that in order to maintain a tolerant society, the society must be intolerant of intolerance.

The reason I think these two ideas are analogous (for my purposes) is because intolerance can be considered a subset of world domination. For example, imagine a world dictator whose only “absolute decree” is that we must not tolerate some particular group. The logic here is that I generalize the problem, show that the general case isn’t true, and thus the specific case must not be true either.

With that informally established, let’s consider the present state of the world. There is no world dictator (yet, perhaps), thus empirically there does not have to be one.

Specifically, this is possible because, even if in fact most people would like to have their own world dictator installed (themselves if no one else), there are actually many such factions which don’t agree with each other. The people who don’t want any dictator, and the people who don’t want the wrong dictator, can collectively form a coalition to keep dictators out, without having a dictator of their own (and without necessarily even agreeing with each other on the specifics).

This is my solution to the “paradox” of tolerance. I may not agree with what you have to say, but I will defend to the death (if necessary) your right to say it. I will defend you even if you wouldn’t defend me on some or all topics. This is stable as long as enough people would defend each other most of the time (which is a state I hope to maintain, in part simply by showing that it is possible!).

Now please stop trying to take over the world in the name of good, because I don’t believe such a thing is possible.

Keywords: politics

[View] [Short] [Hash] [Raw]


Why be a hypocrite?

In the society I grew up in, hypocrisy was basically considered one of the biggest sins, because it’s one of the few things that can be “objective.” We can’t agree on what’s good or bad, so we can’t criticize people for doing bad, but we can tell pretty easily if someone is doing something besides what they say should be done, and then we can criticize them freely, and also usually ignore what they say should be done, especially if it was right.

However, by now, I think I understand why being a hypocrite might not be so bad after all. Mature adults tend to learn that the gap between fantasy and reality is a good one, and it is both good and reasonable to occasionally desire things that should not happen, and to be concerned about things happening which one occasionally desires.

A non-hypocrite is forced to eliminate any gap between their thoughts and their actions, even if their thoughts are imperfect, incomplete, idealistic, or most importantly, completely true but in violation of accepted social norms. In other words, society makes us all hypocrites, and to advocate the elimination of hypocrisy, is to advocate the breakdown of society.

Unfortunately, I’m not much of a hypocrite, but I am quite a weirdo, and this is a lesson I learned a little bit too late. So here’s my hypocritical suggestion: be more of a hypocrite than I am.

[View] [Short] [Hash] [Raw]


The difficulty of content addressing on the web

Right now when you look up a content address on the Hash Archive, it returns a web page with a bunch of links to places where that hash has been found in the past. At that point, you have to manually try the links and verify the response hashes until you get one that works. Wouldn’t it be nice if this process could be automated in the browser?

Unfortunately, the original Subresource Integrity draft got scaled way back by the time it was standardized. Basically, modern browsers can only verify the hashes of scripts and style sheets. They can’t verify the hashes of embedded images, iframes, or links, as cool as that would be.

I think my own ideal solution here would be to support “resource integrity” on HTTP 30x redirects. For example, you try to look up a hash, and the server responds with 302 Found. The headers include the standard Location field which points to some random URL out on the web (potentially from an untrusted server), plus a new Hash field that specifies what the hash of the expected response should be.

The browser follows the redirect, but before presenting the content to the user (inline, as a download, or whatever), it verifies the hash. If the hash is different from what was expected, the redirect fails.

Now if you’ll allow me to really dream, imagine if this worked with the (basically vestigial) 300 Multiple Choices response status. The server could provide a list of URLs that are all supposed to contain the same resource, and the client would try each one in turn until the hash matches. In the case of the Hash Archive, which can’t be certain about what 3rd party servers are doing, this would make the resolving hashes more reliable.

Okay, great. What about alternatives and workarounds that work right now?

Option 1: Server-side hash validation

With this idea, instead of a redirect, the server loads the resource itself, verifies it, and then proxies it to the client. Obviously proxying data is unappealing in and of itself, but the bigger problem is that the entire resource needs to be buffered on the server before any of it can be sent to the client. That’s because HTTP has no “undo” function. Anything that gets sent to the client is trusted, and there’s no way for a server to say, “oh shit, the last 99% of that file I just sent you was wrong.” Closing the connection doesn’t help because browsers happily render broken content. Chunked/tree hashing doesn’t help because you’ve still sent an incomplete file.

Buffering is completely nonviable for large files, because it means the server has to download the file before it can respond, during which the client will probably time out after ~60 seconds. It’s difficult (if not impossible) for the server to stall for time because HTTP wasn’t designed for that.

That said, if you limit this technique to small files (say less than 1-4MB) it should actually work pretty well. It’s also nice because you can try multiple sources on the server and pass along the first one to download and verify. For any solution that requires proxying, you will probably want to have a file size limit anyway.

BTW, this approach also fails to catch any errors after validation between the server and client.

Option 2: Client-side validation with JavaScript

One version of this idea was proposed by Jesse Weinstein. Another version, using Service Workers, was proposed by Substack.

Respond with a full web page that includes a hash validation script, and then download the file via AJAX, hash it, and then write it out via data:. This is pretty much how the file hosting service Mega’s client-side encryption works, except with hashing instead of encryption.

Mega actually works pretty well, but it’s purely for downloads. When you start thinking about embedded content, it becomes messier and messier to the point of not being worth it, IMHO. The thought of telling users to copy and paste JavaScript just to hotlink images gives me the heebie-jeebies.

This still involves buffering the data in memory (client side), which puts a cap on the file size, and proxying content through my server (to dodge the Same Origin Policy), which implies another cap and makes it less appealing.

Basically I don’t think this approach would be in good taste.

Option 3: Browser extension!

No, just no. I will write a mobile app before I pin my hopes on a browser extension. Plus, the Chrome extension API is extremely limited and probably doesn’t even let you do this.


I’m ready to throw in the towel, at least for now. Server-side validation might be good for a party trick at some point. Trying to make content addressing on the web work (well) without the support of at least one major browser vendor doesn’t seem feasible.

I think problems like this have interesting implications for the web, the current approach to sandboxing, and solutionism in general.

P.S. Thanks to the great Archive Labs folks who discussed all these ideas with me at length!

[View] [Short] [Hash] [Raw]


I’ve been exploring the idea of processes, users, containers and VMs as various expressions of an underlying isolation mechanism.

I also recently wrote about the idea of a file system that implemented directories as recursive mounts[#].

Combining these ideas, together with the idea of unikernels, I’ve come to the idea of “ProcOS.”

A unikernel is the minimum possible operating system that gets you from bare metal to a single running process. Unlike a traditional operating system, it doesn’t have any multiplexing features (scheduler, etc.), so it can only run one program at a time.

“ProcOS” is the unikernel’s complement: it is just a process that runs other processes. It’s a pure multiplexer, with no hardware support. It’s an OS that runs as a regular process on a traditional OS, a unikernel, or–get this–itself.

All ProcOS can do is run processes. Users and containers are just recursive instances of ProcOS.

So for example, let’s say you’re running ProcOS on a unikernel. You turn on the computer, and the unikernel loads. It starts running the root ProcOS. It is configured to spawn a login process, which lets users log into the system.

When a user logs in, the login process uses a special message to tell the root process to spawn a user process for that user. The user process contains the desktop environment or whatever. Anything the user does runs within their process.

One cool thing about the ProcOS idea is that it can be quite portable. You can run it in a VM with a unikernel, or you can run it directly on top of another OS as a regular process. This makes it very clear the overhead that is being removed: the VM and the unikernel (including all drivers) just evaporate.

Ultimately, how practical is this idea? Well, if you were to build a secure multiplexer, this is probably the way you’d want to build it, so that it could be as simple (meaning reliable) and versatile as possible.

[View] [Short] [Hash] [Raw]


What’s up with fashion?

Things you want to get in on before they become fashionable:

Things you want to get in on that will remain (more or less) unfashionable:

Things you want to get in on when they’re fashionable:

Things you want to get in on after they’ve stopped being fashionable:

It seems to me like following fashion is going to get more and more dangerous, because if there is a bubble, that means there is a large incentive to pop it. To be fair, there’s also an enormous incentive to prop it up.

On the other hand, if you’re just doing your own thing, even if you’re completely vulnerable, you’re a small target and it might not be worth anyone’s effort to eat your lunch. But, then again, no one is going to defend you or help you out either.

Personally, I welcome any moderating forces on fashion, so that the highs may be a little less stratospheric and the lows may be a little less bottomless.

Keywords: fashion, herd behavior, hype