HomeLog InSource

notes.public

[View] [Short] [Hash] [Raw]

EarthFS 2014-08-16

hash://sha1/08a1c083ae002c5826c51533e4d3b90b21188af7[#]

After checking online, it seems like our concept of optimized symbolic temporary tables sort of already exists in the form of “derived tables.” Loosely, that includes subqueries, common table expressions, and views.

After coming to that point, I finally realized what I think is the truth about why SQL sucks. SQL “doesn’t compose” because the query optimizer doesn’t compose. The limits of SQL are the limits of the implementation.

I tested that, although probably not thoroughly enough, by looking at how the optimizer handles queries referencing multiple views. It doesn’t seem to be able to optimize across them at all.

That seems like the final nail in the coffin, not for SQL as a whole, but at least for getting the query optimizer to do anything useful for us. From here, we can either do everything from scratch in MDB, or keep using SQLite as a convenient (but slow and sometimes inflexible) b-tree front-end.

The only thing making me hesitate is SQLite’s FTS extension. A lot of work has been done on it to make insertions fast.

However, I’m not even sure we could get decent performance by using the SQL FTS interface at such a low level. Right now, we walk down the list of file IDs and do a query against each one. For a regular filter that’s slow (2 seconds), but for a full text filter it’s really slow (20 seconds). Now obviously if we were smarter about our join ordering, we could turn that into a regular FTS query, which would be fast… But what about multiple FTS filters? The first one would be fast, and the rest would be slow?

I’m very tired of SQL. We’ve spent so much time just trying to figure out whether it can even do what we want. It probably really would’ve been faster to do everything ourselves from scratch. Not to mention less frustrating.