Fully dynamic data structure for LCE queries in compressed space

26 Jun 2016  ·  Nishimoto Takaaki, I Tomohiro, Inenaga Shunsuke, Bannai Hideo, Takeda Masayuki ·

A Longest Common Extension (LCE) query on a text $T$ of length $N$ asks for the length of the longest common prefix of suffixes starting at given two positions. We show that the signature encoding $\mathcal{G}$ of size $w = O(\min(z \log N \log^* M, N))$ [Mehlhorn et al., Algorithmica 17(2):183-198, 1997] of $T$, which can be seen as a compressed representation of $T$, has a capability to support LCE queries in $O(\log N + \log \ell \log^* M)$ time, where $\ell$ is the answer to the query, $z$ is the size of the Lempel-Ziv77 (LZ77) factorization of $T$, and $M \geq 4N$ is an integer that can be handled in constant time under word RAM model... In compressed space, this is the fastest deterministic LCE data structure in many cases. Moreover, $\mathcal{G}$ can be enhanced to support efficient update operations: After processing $\mathcal{G}$ in $O(w f_{\mathcal{A}})$ time, we can insert/delete any (sub)string of length $y$ into/from an arbitrary position of $T$ in $O((y+ \log N\log^* M) f_{\mathcal{A}})$ time, where $f_{\mathcal{A}} = O(\min \{ \frac{\log\log M \log\log w}{\log\log\log M}, \sqrt{\frac{\log w}{\log\log w}} \})$. This yields the first fully dynamic LCE data structure. We also present efficient construction algorithms from various types of inputs: We can construct $\mathcal{G}$ in $O(N f_{\mathcal{A}})$ time from uncompressed string $T$; in $O(n \log\log n \log N \log^* M)$ time from grammar-compressed string $T$ represented by a straight-line program of size $n$; and in $O(z f_{\mathcal{A}} \log N \log^* M)$ time from LZ77-compressed string $T$ with $z$ factors. On top of the above contributions, we show several applications of our data structures which improve previous best known results on grammar-compressed string processing. read more

PDF Abstract
No code implementations yet. Submit your code now


Data Structures and Algorithms


  Add Datasets introduced or used in this paper