File systems are necessary because files need to move between applications and devices. File systems are already an abstraction, and they will continue to be useful until someone comes up with a better one. Which, at this point, seems unlikely.
I’d like EarthFS to fill that role. But it can be a better file system by sitting on top of existing file systems, just like the web can sit on top of native apps (web browsers).
A way to effortlessly tag any content, anywhere and on any device in a predictive and n-dimensional way so that content is broken down to its most granular, atomic and consumable unit- a quote of a book, a lyric of a song, a smirk or a smile in a photo, a highlight clip of a video, the exact anniversary gift you intend to buy- each having its own ‘GPS coordinate’ on the Web, or in some emergent meta-layer fabric that hovers above and interweaves our now-Web.
Everyone seems to think that tagging is the way to go… But I’ve compared it with full-text search the best I can, and it doesn’t seem to have a chance. Where are the people tagging web sites the way Google indexes them automatically?
But as for this “GPS coordinates for everything,” does that mean content addressing? Because I think it means content addressing.
Search engines are useful when you’ve got a haystack of stuff and you’re looking for a needle. They’re particularly useful when you’re looking for something novel, usually an answer to a question. Often you don’t know where the content is and you probably didn’t create it either. They have two benefits: removing the tedium of hunting intelligently through an unknown haystack for some content, and doing it faster than could be done manually. They also come with the benefit of returning alternate results.
The file/directory model is useful when you know roughly (or exactly) where something is. It’s easier and often quicker to clicky clicky all the way to your content than it is to describe what you’re looking for, waiting, wading through possibly irrelevant results, and then refining your query if you didn’t succeed first time around. This model also has the benefit of returning related content (other files in the same directory), as opposed to alternate results. It’s for this reason the file system is going to live on, simply because there are many daily situations in which clicking or tapping on directories to get to a file is easier and perceptually quicker.
Both filesystems and search engines are useful in different scenarios, which is why we use both depending on which is easier in a given scenario.
Interesting argument. However, file systems have completely lost out to per-app data silos. This app data has many of the features he lists as being useful for search engines:
- Data you didn’t create (e.g. songs)
- Automatic organization
I feel like if the choice is app silos or something better than the file system, these people would accept a file system replacement.
Keywords: arguing, models/equivalence/distinctions, scaling/efficiency
It’s funny how the people who so quickly accepted automatic window management can simultaneously get so hung up on automatic file management.
Keywords: bias, psychology, change, learning
My parents both organize pictures into folders very well and organize them by date and tag them with metadata. […]
What a waste of time! If only we had machines that could free them from such drudgery!
Colleagues of mine at a top-tier university are already seeing this in students. They have “computing” experience, but no real sense of the file system as an organizational and navigational tool. This poses a question in my mind:
Are we creating a world of perpetual intermediate users?
How can we have such smart people who still don’t understand the basics of abstraction?
Yes, abstracting over things means that people don’t have to learn about them. In some cases, that means those things eventually go away entirely. In other cases, they just become professional knowledge (like how to fix cars).
I’ve been working on a non-hierarchical filesystem (if you want to call it a “filesystem”) called Library Transfer Protocol. It created a new type of file called a “library item” which is different from a file in the sense that library items are immutable. An image, a video, or a finished blog post can be a library item. Since all library items are immutable, we don’t have to worry about the CAP theorem, and therefore the “filesystem” can be distributed. You can see the code here: https://github.com/priestc/Library-Transfer-Protocol
Let’s get this guy on the phone pronto.
LTP is a protocol for publishing, archiving, and organizing media files. The protocol is conceptually very similar to email, except backwards. With SMTP, you send messages to other people’s inboxes, and other people send messages to your inbox.
With LTP, you publish media items (photos, videos, blog entries, other personal data) to your a library, other people publish data to their library, and these libraries can talk to each other to allow data to flow between them, as configured by the user who created the library.
Library Transfer Protocol: A Description with Pictures
Eerily similar to WinFS
Okay, but that sounds different from EarthFS…
Solves a few big internet problems
The “webscale problem”
Long term archives
He uses a screen shot of Spectral Layers to represent audio.
He’s big on meta-data, whereas EarthFS says NO META-DATA.
- Standard [Public]
The “academic” type is interesting but I doubt it could work.
What goes in a Library?
Good: […] Basically anything where metadata makes sense
Bad: Software, documents, relational data
Anything that is mutable in nature
EarthFS has no meta-data, so it’s arguably suitable for a wider range of files. It could probably good for storing software (executables and related files). Anything that could benefit from content addressing.
Okay, in conclusion, I think I’m actually significantly ahead of this guy.
I was going to make a list of our advantages, but I guess I should withhold judgment. We should download his code and try it out.
I don’t think I’m going to try talking to him, because I’m not sure how much I would get out of it (and I’m antisocial). But I did star his repo, so if he’s like me, he’ll notice and check out my profile. In which case there’s a small chance he’ll find my “notetaking” repo, that includes my old manifesto as its readme.
I hate this concept - the app doesn’t own the data.
The app developers would like the app to own the data.
That this isn’t in the users’ interests should be self-evident. Vertical data silos with doors owned by gatekeepers who want to charge admission: just say no!
Charlie Stross weighing in on incentives and capitalism.
2.1 People lie [incentives]
2.2 People are lazy [incentives, scaling/efficiency]
2.3 People are stupid [not sure how to file this one… bugs?]
2.4 Mission: Impossible – know thyself [incentives, not sure what else]
2.5 Schemas aren’t neutral [incentives, type systems, code versus data]
2.6 Metrics influence results [incentives, Campbell’s law, metrics/benchmarks]
2.7 There’s more than one way to describe something [ambiguity]
Long story short: incentives.
That’s the biggest reason why tagging is worse than full-text search. At least full-text search has a shot. It’s also a pretty good argument against meta-data fields in Library Transfer Protocol.
EarthFS could suffer from this problem too if we offer meta-entries. Hopefully they aren’t too “powerful.” It’s very important for systems to keep authors relatively weak. hash://sha1/85c7eb924930b4f99cff928ce6e6ed1b7d5a0fef[#]
hierarchy is the only mechanism that the human brain has for dealing with complexity. A few people balked, they don’t like ‘trees’ or what have you but nobody could come up with an alternative.
None of you could think of anything better because you’re engineers, not psychologists or philosophers.
The “human brain” deals with complexity through: […]
Keywords is an option, but then again if I send someone else my files what happens if there is a keyword conflict on their system? So we need namespaces or a way to track file origin/ownership across the entire planet? urgh.
When I use words a lot, I tend to assign them a clear, defined meaning. Unfortunately, that meaning isn’t inherent in the words themselves. This is the fundamental problem of tagging, IMHO.
I make one folder and keep all my manually created content there. […] It’s inspired by Camera Roll. Just a single folder with folders for events, sorted by time.
Even “cool URLs” are recommended to be organized by time. I think this is pretty much the definitive one true way.
Cool URIs don’t change
I didn’t think URLs have to be persistent - that was URNs.
This is the probably one of the worst side-effects of the URN discussions. Some seem to think that because there is research about namespaces which will be more persistent, that they can be as lax about dangling links as they like as “URNs will fix all that”. If you are one of these folks, then allow me to disillusion you.
Most URN schemes I have seen look something like an authority ID followed by either a date and a string you choose, or just a string you choose. This looks very like an HTTP URI. In other words, if you think your organization will be capable of creating URNs which will last, then prove it by doing it now and using them for your HTTP URIs. There is nothing about HTTP which makes your URIs unstable. It is your organization. Make a database which maps document URN to current filename, and let the web server use that to actually retrieve files.
I quoted the whole thing because it’s important. I see this as basically proof that URNs were the “right” approach, and URLs were the “practical” one. Of course, worse is better, and by now URLs are thoroughly entrenched. But maybe there’s an opportunity for URNs to catch up.
If cool URIs don’t change, then give me a domain name that will never expire, please.
The creation date of the document - the date the URI is issued - is one thing which will not change. It is very useful for separating requests which use a new system from those which use an old system. That is one thing with which it is good to start a URI. If a document is in any way dated, even though it will be of interest for generations, then the date is a good starter.
Finite depth - we all have a very good idea of what the ideal solution is, even when we come from many different directions and ways of thinking about it.