RRB-Trees: Efficient Immutable Vectors (2012) [pdf]

(infoscience.epfl.ch)

45 points by azhenley 4 days ago|13 comments

•

xeubie 2 days ago

I used this data structure in my immutable database (see profile) but eventually switched to a B-tree because I believe RRB trees are inherently flawed. If you do a large number of slices and concats, it is possible for the tree to contain so many gaps it that the tree grows so deep it can't be modified anymore. At first I thought it was a bug in my own implementation but I eventually found the same bug in rrb-vector, the clojure implementation (see CRRBV-14). In fact, the maintainer of that library reached the same conclusion I did and switched to B-trees: https://github.com/jafingerhut/core.btree-vector

Still, I have huge respect for Phil Bagwell and I make heavy use of his hash array mapped trie. But this issue with RRB trees makes it impossible for me to use them, especially for an on-disk data structure whose long lifespan makes it way more likely that the problem will eventually happen.

•

bjoli 23 hours ago

Hmmm. This comment made me think. If you would do a radix tree that degrades to a b+tree, you would get a couple of things. The b+tree concat and subsequently insertions and removal, is faster than the rrb tree one, but yields more relaxed nodes.

If you have a relaxed root, you know the size of every subtree. If the the subtree is dense you can easily calculate the shift, meaning you get dense radix tree subnodes with fast lookup.

This gets you the benefits of a relaxed b tree, but no issues with accidentally having too deep shifts since they are relative to where the dense subtree starts.

This could be done to a rrb tree as well, which would mitigate the issues you are seeing, but would still lead to an arbitrarily deep tree if the concatenation isn't done properly.

The only tradeoff would be slower indexing on relaxed trees because more nodes would end up unbalanced, but at the same time concatenations, insertions and removals would probably be faster.

I will try to implement this.

Edit: no, seriously. This is a splendid idea. The b tree concat is faster than the rrb tree "full" merge. And simpler. And storing the tree subtree height in a byte somewhere means we can go from relaxed to dense and then do a proper radix tree search. (I have one of those rrb tree implementations that probably fails, btw. It has a proper concat, but the improper one is so much faster. I know. I am ashamed.)

•

xeubie 22 hours ago

Yeah you should try it. I place a very big value on simplicity and code size but the combo you describe might have benefits.

•

bjoli 20 hours ago

Edit: Sorry for repeating myself. Still thinking out loud.

I have already started. Not only is the concat faster (which means the derived operations in the rrb tree will be faster), but the derived operations can be done directly since the relaxation is going to be between 16 and 32 elements. A remove will just be a copy 16 out of 17 times.

This will mean that indexing might be slower in the b tree (it will be higher since the fill ratio is lower) part of the tree, and there might be slightly more of the relaxed nodes, but sweet jeebus this is a simplification.

A simplification that makes correctness easier and speeds up all other operations.setting the minimal amount of elements to 24 will still make concat faster, and will make memory waste almost the same (I can use c# inline arrays without feeling too bad).

I have both a fast b+tree and am Rrb tree. Why didn't I think of this before?!

•

bjoli 2 days ago

that is because many implementations bypass strict rebalancing to squeeze out extra concat performance. And because the strict rebalancing algorithm is awful to implement.

•

HexDecOctBin 2 days ago

Did you use some resource on how to layer immutability on top of B-trees? This is finicky enough that I don't feel comfortable YOLOing.

•

bjoli 2 days ago

This is a persistent ordered map on b+trees.

https://rikspucko.koketteriet.se/bjoli/PersistentMap/src/bra...

•

erichocean 3 days ago

If you like this kind of thing, Bifurcan [0] is a Java library with implementations of RBB-trees and related (fast) immutable data structures.

[0] https://github.com/lacuna/bifurcan

•

ashton314 2 days ago

This is Rhombus’ native data type! Such a nice data structure.

•

azhenley 2 days ago

There was some interesting discussion which data structure to use for Rhombus: https://github.com/racket/rhombus/discussions/221

•

wasting_time 4 days ago

A refreshing break from Molt News. Now I want to check how vectors are implemented in my favorite languages.

•

inhumantsar 4 days ago

the `im` rust crate provides immutable data structures, one of them being an RRB-based Vec. don't remember what the stdlib Vec uses.

•

oniony 4 days ago

I believe Vec is a straight array underneath, which is reallocated at a larger size when full. And Vector in the `im` crate you mentioned looks very interesting indeed.