nicity (nicity) wrote,

B+HTree, now powering IntelliJ IDEA 11 indices

Last summer I implemented mine own flavor of B+Tree aka B+HTree, H for Hash. Usually O characteristics for B trees are calculated on height of the tree, however insertion of the data to leaf block can be costly. E.g. if key / value pairs are stored as array, one needs to shift at least half of the pairs at insertion. To avoid the problem, the data storage itself is organized as binary tree or nested b+tree, still having complexity of logarithm of the size of leaf block (usually the size is at least 4K, or 100x of key/value pairs). B+HTree leaf node is organized as hash table with open addressing, achieving O(1) complexity for insertion / removal of elements, at the expense of some extra bits.

The data structure is used to store every index in IntelliJ IDEA 11, e.g. inverted index of indexed source code for IDEA project is 1,8M keys. For such size, it turned out that B+tree based index is twice as compact comparing to previous hash based representation. Index compactness + limited leaf page accesses contributed to significantly better indexing performance of IntelliJ IDEA 11 for large code bases.

Tags: b+htree

  • C/C++ plugin v. 0.8.1 for IntelliJ IDEA 8/9/X

    Change notes: * Major feature: initial support for handling Cpp and C files without switching settings. * Completion of member names in…

  • Удивительное рядом

    В очередной раз смотрю как быстро индексирует исходники Идея, на этот раз открываю ещё таск-мэнеджер, включаю опцию Read I/O bytes , Write I/O bytes…

  • Зачинили баг

    которому исполнилось примерно 5 (!) лет (6 мая 2003 года он уже был, дальше cvs историю не просмотреть)

  • Post a new comment


    default userpic

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.