Grammar-Compressed Indexes with Logarithmic Search Time

1 Apr 2020  ·  Francisco Claude, Gonzalo Navarro, Alejandro Pacheco ·

Let a text $T[1..n]$ be the only string generated by a context-free grammar with $g$ (terminal and nonterminal) symbols, and of size $G$ (measured as the sum of the lengths of the right-hand sides of the rules). Such a grammar, called a grammar-compressed representation of $T$, can be encoded using essentially $G\lg g$ bits. We introduce the first grammar-compressed index that uses $O(G\lg n)$ bits and can find the $occ$ occurrences of patterns $P[1..m]$ in time $O((m^2+occ)\lg G)$. We implement the index and demonstrate its practicality in comparison with the state of the art, on highly repetitive text collections.

PDF Abstract


  Add Datasets introduced or used in this paper