24 Dec 2018
•
Bender Michael A.
•
Berry Jonathan W.
•
Farach-Colton Martin
•
Johnson Rob
•
Kroeger Thomas M.
•
Pandey Prashant
•
Phillips Cynthia A.
•
Singh Shikha
Given a stream $S = (s_1, s_2, ..., s_N)$, a $\phi$-heavy hitter is an item
$s_i$ that occurs at least $\phi N$ times in $S$. The problem of finding
heavy-hitters has been extensively studied in the database literature...In this
paper, we study a related problem. We say that there is a $\phi$-event at time
$t$ if $s_t$ occurs exactly $\phi N$ times in $(s_1, s_2, ..., s_t)$. Thus, for
each $\phi$-heavy hitter there is a single $\phi$-event which occurs when its
count reaches the reporting threshold $\phi N$. We define the online
event-detection problem (OEDP) as: given $\phi$ and a stream $S$, report all
$\phi$-events as soon as they occur. Many real-world monitoring systems demand event detection where all events
must be reported (no false negatives), in a timely manner, with no non-events
reported (no false positives), and a low reporting threshold. As a result, the
OEDP requires a large amount of space (Omega(N) words) and is not solvable in
the streaming model or via standard sampling-based approaches. Since OEDP requires large space, we focus on cache-efficient algorithms in
the external-memory model. We provide algorithms for the OEDP that are within a log factor of optimal. Our algorithms are tunable: its parameters can be set to allow for a bounded
false-positives and a bounded delay in reporting. None of our relaxations allow
false negatives since reporting all events is a strict requirement of our
applications. Finally, we show improved results when the count of items in the
input stream follows a power-law distribution.(read more)