How Stacks Are Handled in Go (cloudflare.com)
While we can allocate large amounts of memory for 64-bit systems, it relies on overcommitting memory. Overcommit is when you allocate more memory than you physically have and rely on the operating system to make sure that physical memory is allocated when it is needed. However, enabling overcommit carries some risk. Since processes can allocate more memory than the machine has, it has to make up memory somehow if the processes start actually using more memory than is available. It can do this by putting sections of memory onto disk, but this adds latency that is unpredictable and often, systems are run with overcommit turned off for this reason.
I see. That sucks.
Since malloc already “can’t fail” on Linux, I think it would make sense to have overcommit turned on and swap turned off. If you touch a new page and run out of memory, the system kills something.
Stack copying is pretty much impossible in C, plus there’s the extra overhead of constantly checking whether you have enough stack space left.
If C had proper exceptions, you could catch the segfault and unwind, then start over on a bigger stack. Although that would have plenty of problems too.
8KB, which is what Go uses by default, happens to be smaller than what libco will let us create on some platforms.
They say “millions” of Goroutines are common, but that can’t be true, right? A million 8K allocations is going to be slow by itself, plus you have the overhead of running all of them (switching in Golang can’t be faster than switching in libco, can it?).
Right now our stack size in EarthFS is 48KB (on 32-bit) and we’ve never segfaulted aside from some bugs. I’ve been thinking we could probably cut it down to 32K without issue.
We’re still limited in when we can use fibers, which sucks. It’d be so nice to have two or three fibers per connection sometimes, but it just isn’t worth it.
It seems like the next step should be compiling to state machines, somehow.
Why not leave the segmented stacks, and simply refuse to shrink them?
It’s not just the allocation—just the cost of switching between stack segments is really expensive. Function calls are 2 ns on most architectures; any overhead can easily make that 5x or 10x slower.
If this is the justification, I think it’s a bad one. In the unlikely event that segmented stacks work with libco, that might actually be worth trying too.
Another idea: if you grow to 3 segments, and later shrink down to 1, then you can replace segments 2 and 3 with a single larger segment, getting rid of one of the jumps. But you have to wait until you’re out of that code before you can combine them, at which point it may no longer matter.