nicity (nicity) wrote,

B+HTree, now powering IntelliJ IDEA 11 indices

Last summer I implemented mine own flavor of B+Tree aka B+HTree, H for Hash. Usually O characteristics for B trees are calculated on height of the tree, however insertion of the data to leaf block can be costly. E.g. if key / value pairs are stored as array, one needs to shift at least half of the pairs at insertion. To avoid the problem, the data storage itself is organized as binary tree or nested b+tree, still having complexity of logarithm of the size of leaf block (usually the size is at least 4K, or 100x of key/value pairs). B+HTree leaf node is organized as hash table with open addressing, achieving O(1) complexity for insertion / removal of elements, at the expense of some extra bits.

The data structure is used to store every index in IntelliJ IDEA 11, e.g. inverted index of indexed source code for IDEA project is 1,8M keys. For such size, it turned out that B+tree based index is twice as compact comparing to previous hash based representation. Index compactness + limited leaf page accesses contributed to significantly better indexing performance of IntelliJ IDEA 11 for large code bases.

Tags: b+htree

  • Post a new comment


    default userpic

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.