Piecewise testable languages are a subclass of the regular languages. There
are many equivalent ways of defining them; Simon's congruence $\sim_k$ is one
of the most classical approaches...Two words are $\sim_k$-equivalent if they
have the same set of (scattered) subwords of length at most k. A language L is
piecewise testable if there exists some k such that L is a union of
$\sim_k$-classes. For each equivalence class of $\sim_k$, one can define a
canonical representative in shortlex normal form, that is, the minimal word
with respect to the lexicographic order among the shortest words in $\sim_k$. We present an algorithm for computing the canonical representative of the
$\sim_k$-class of a given word $w \in A^*$ of length n. The running time of our
algorithm is in O(|A|n) even if $k \le n$ is part of the input. This is
surprising since the number of possible subwords grows exponentially in k. The
case $k > n$ is not interesting since then, the equivalence class of w is a
singleton. If the alphabet is fixed, the running time of our algorithm is
linear in the size of the input word. Moreover, for fixed alphabet, we show
that the computation of shortlex normal forms for $\sim_k$ is possible in
deterministic logarithmic space. One of the consequences of our algorithm is
that one can check with the same complexity whether two words are
$\sim_k$-equivalent (with k being part of the input).(read more)