Binary Jumbled Pattern Matching On Trees and Tree-Like Structures

Algorithmica (2015) 73:571–588
DOI 10.1007/s00453-014-9957-6
Binary Jumbled Pattern Matching on Trees

and Tree-Like Structures
Travis Gagie · Danny Hermelin ·

Gad M. Landau · Oren Weimann
Received: 12 September 2013 / Accepted: 11 November 2014 / Published online: 27 November 2014
© Springer Science+Business Media New York 2014
Abstract Binary jumbled pattern matching asks to preprocess a binary string S in

order to answer queries (i, j) which ask for a substring of S that is of length i and
has exactly j 1-bits. This problem naturally generalizes to vertex-labeled trees and
graphs by replacing “substring” with “connected subgraph”. In this paper, we give
an O(n 2 / log2 n)-time solution for trees, matching the currently best bound for (the
simpler problem of) strings. We also give an O(g 2/3 n 4/3 /(log n)4/3 )-time solution for
strings that are compressed by a context-free grammar of size g in Chomsky normal
form. This solution improves the known bounds when the string is compressible under
many popular compression schemes. Finally, we prove that on graphs the problem is
fixed-parameter tractable with respect to the treewidth w of the graph, even for a
Gad M. Landau: Supported in part by the National Science Foundation (NSF) Grant 0904246, the Israel
Science Foundation (ISF) Grant 347/09, and the United States-Israel Binational Science Foundation
(BSF) Grant 2008217. Oren Weimann: Supported in part by the Israel Science Foundation Grant 794/13.
Preliminary version of this paper appeared in the 21st Annual European Symposium on Algorithms (ESA
2013).
T. Gagie
University of Helsinki, Helsinki, Finland
e-mail: travis.gagie@aalto.fi
D. Hermelin
Ben-Gurion University, Beersheba, Israel
e-mail: hermelin@bgu.ac.il
G. M. Landau · O. Weimann (B)

University of Haifa, Haifa, Israel
e-mail: oren@cs.haifa.ac.il
G. M. Landau
e-mail: landau@cs.haifa.ac.il
123
572 Algorithmica (2015) 73:571–588
constant number of different vertex-labels, thus improving the previous best n O(w)
algorithm.
Keywords Pattern matching · Tree pattern matching · Permutation pattern

matching · Grammar compression · Graph motifs
1 Introduction
Jumbled pattern matching is an important variant of classical pattern matching with

several applications in computational biology, ranging from alignment [5] and SNP
discovery [7], to the interpretation of mass spectrometry data [10] and metabolic
network analysis [26]. In the most basic case of strings, the problem asks to determine
whether a given pattern P can be rearranged so that it appears in a given text T . That is,
whether T contains a substring of length |P| where each letter of the alphabet occurs
the same number of times as in P. Using a straightforward sliding window algorithm,
such a jumbled occurrence can be found optimally in O(n) time on a text of length n.
While jumbled pattern matching has a simple efficient solution, its indexing problem
is much more challenging. In the indexing problem, we preprocess a given text T so
that on queries P we can determine quickly whether T has a jumbled occurrence of
P. Very little is known about this problem besides the trivial naive solution.
Most of the interesting results on indexing for jumbled pattern matching relate to
binary strings (where a query pattern (i, j) asks for a substring of T that is of length i
and has j 1s). Given a binary string of length n, Cicalese, Fici and Lipták [14] showed
how one can build in O(n 2 ) time an O(n)-space index that answers jumbled pattern
matching queries in O(1) time. Their key observation was that if one substring of
length i contains fewer than j 1s, and another substring of length i contains more than
j 1s, then there must be a substring of length i with exactly j 1s. Using this observation,
they construct an index that stores the maximum and minimum number of 1s in any
i-length substring, for each possible i. Burcsi et al. [10] (see also [11,12]) and Moosa
and Rahman [27] independently improved the construction time to O(n 2 / log n), then
Moosa and Rahman [28] further improved it to O(n 2 / log2 n) in the word RAM model.
Currently, faster algorithms than O(n 2 / log2 n) exist only when the string compresses
well using run-length encoding [4,23] or when we are willing to settle for approximate
indices [16]. Regarding non-binary alphabets, the recent solution of Kociumaka et al.
[25] for constant alphabets requires o(n 2 ) space and o(n) query time. For general
alphabets, expected sublinear query time was achieved by Burcsi et al. [11] for large
query patterns but in the worst case a query takes superlinear time. In fact, a recent
result of Amir et al. [3] shows that under a 3SUM hardness assumption, jumbled
indexing for alphabets of size ω(1) requires either Ω(n 2−ε ) preprocessing time or
Ω(n 1−δ ) query time for any ε, δ > 0.
The natural extension of jumbled pattern matching from strings to trees is much
harder. In this extension, we are asked to determine whether a vertex-labeled input
tree has a connected subgraph where each label occurs the same number of times as
specified by the input query. The difficulty here stems from the fact that a tree can have
an exponential number of connected subgraphs as opposed to strings. Hence, a sliding
123
Algorithmica (2015) 73:571–588 573
window approach becomes intractable. Indeed, the problem is NP-hard [26], even
if our query contains at most one occurrence of each letter [21]. It is not even fixed-
parameter tractable when parameterized by the alphabet size [21]. The fixed-parameter
tractability of the problem was further studied when extending the problem from
trees to graphs [2,6,17,18]. In particular, the problem (also known as the graph motif
problem) was recently shown by Fellows et al. [21] to be polynomial-time solvable
when the number of letters in the alphabet as well as the treewidth of the graph are
both fixed. They also gave a fixed-parameter algorithm when the size of the pattern
is taken as a parameter, and showed that no such algorithm is likely to exist when the
problem is parameterized by the alphabet size, even when the input graph is a tree.
Our results In this paper we extend the currently known state-of-the-art for binary
jumbled pattern matching. Our results focus on trees, and tree-like structures such as
grammars and bounded treewidth graphs. We use the word RAM model of computation
with the standard assumption that the word-length is at least log n.
• Trees For a tree T of size n, we present an index of size O(n) bits that is constructed
in O(n 2 / log2 n) time and answers binary jumbled pattern matching queries in O(1)
time. This matches the performance of the best known index for binary strings. In
fact, our index for trees is obtained by multiple applications of an efficient algorithm
for strings [28] under a more careful analysis. This is combined with both a micro–
macro [1] and centroid decomposition of the input tree. Our index can also be used
as an O(ni/ log2 n)-time algorithm for the pattern matching (as opposed to the
indexing) problem, where i denotes the size of the pattern. Finally, by increasing
the space of our index to O(n log n) bits, we can output in O(log n) time a node of
T that is part of the pattern occurrence.
• Grammars For a binary string S of length n derived by a context free gram-
mar (CFG) of size g in Chomsky normal form, we show how to construct in
O(g 2/3 n 4/3 / log4/3 n) time an index of size O(n) bits that answers jumbled pattern
matching queries on S in O(1) time. The size of the grammar g can be exponentially
smaller than n and is always at most O(n/ log n). This means that our time bound
is O(n 2 / log2 n) even when S is not compressible. If S is compressible but with
other compression schemes such as the LZ-family, then we can transform it into a
grammar-based compression with little or no expansion [13,30].
• Bounded Treewidth Graphs: For a graph G with treewidth bounded by w, we show
how to improve on the O(n O(w) ) time algorithm of Fellows et al. [21] to an algorithm
which runs in 2 O(w ) n +w O(w) n O(1) time. Thus, we show that for a binary alphabet,
3
jumbled pattern matching is fixed-parameter tractable when parameterized only by

the treewidth. This result extends easily to alphabets of constant sizes.
We present our results for trees, grammars, and bounded treewidth graphs in Sects. 2, 3
and 4 respectively.
2 Jumbled Pattern Matching on Trees
In this section we consider the natural extension of binary jumbled pattern matching
to trees. Recall that in this extension we are given a tree T with n nodes, where each
123
574 Algorithmica (2015) 73:571–588
node is labeled by either 1 or 0. We will refer to the nodes labeled 1 as black nodes,
and the nodes labeled 0 as white nodes. Our goal is to construct a data structure that
on query (i, j) determines whether T contains a connected subgraph with exactly i
nodes, j of which are black. Such a subgraph of T is referred to as a pattern and (i, j)
is said to appear in T . The main result of this section is stated below.
Theorem 1 Given a tree T with n nodes that are colored black or white, we can
construct in O(n 2 / log2 n) time a data structure of size O(n) bits that given a query
(i, j) determines in O(1) time if (i, j) appears in T .
Notice that the bounds of Theorem 1 match the currently best bounds for the case
where T is a string [27,28]. This is despite the fact that a string has only O(n 2 )
substrings while a tree can have Ω(2n ) connected subgraphs. The following lemma
indicates an important property of string jumbled pattern matching that carries over
to trees. It gives rise to a simple index described below.
Lemma 1 If (i, j1 ) and (i, j2 ) both appear in T , then for every j1 ≤ j ≤ j2 , (i, j)
appears in T .
Proof Let j be an arbitrary integer with j1 ≤ j ≤ j2 , and let T1 and T2 be two patterns
in T corresponding to (i, j1 ) and (i, j2 ) respectively. The lemma follows from the fact
that there exists a sequence of patterns starting with T1 and ending with T2 such that
every pattern has exactly i nodes and two consecutive patterns differ by removing a
leaf from the first pattern and adding a different node instead. This means that the
numbers of black nodes in two consecutive patterns differ by at most 1.

2.1 A Simple Index
As in the case of strings, the above lemma suggests an O(n)-size data structure: For
every i = 1, . . . , n, store the minimum and maximum values i min and i max such
that (i, i min ) and (i, i max ) appear in T . This way, upon query (i, j), we can report in
constant time whether (i, j) appears in T by checking if i min ≤ j ≤ i max . However,
while O(n 2 ) construction-time is trivial for strings (for every i = 0, . . . , n, slide a
window of length i through the text in O(n) time) it is harder on trees.
To obtain O(n 2 ) construction time, we begin by converting our tree into a rooted
ordered binary tree. We arbitrarily root the tree T . To convert it to a binary tree, we
duplicate each node with more than two children as follows: Let v be a node with
children u 1 , . . . , u k , k ≥ 3. We replace v with k − 1 new nodes v1 , . . . , vk−1 , make u 1
and u 2 be the left and right children of v1 respectively, and for each = 2, . . . , k − 1
we make v−1 and u +1 be the left and right children of v respectively. If v is not
the root then we set the parent of vk−1 to be the parent of v (otherwise, vk−1 is the
root). The node v1 gets the same color as the the node v. The other nodes v2 , . . . , vk
are called dummy nodes and have no color. This procedure at most doubles the size of
T . To avoid cumbersome notation, we henceforth use T and n to denote the resulting
rooted binary tree and its number of nodes respectively. For a node v, we let Tv denote
the subtree of T rooted at v (i.e. the connected subgraph induced by v and all its
descendants).
123
Algorithmica (2015) 73:571–588 575
Next, in a bottom-up fashion, we compute for each node v of T an array Av of size

|Tv | + 1.
For i > 0, Av [i] will store the maximum number of black nodes that appear
in a connected subgraph of size i that (1) includes i non-dummy nodes in Tv , (2)
includes v, and (3) includes dummy nodes such that for every included dummy node
its corresponding non-dummy node must also be included. The entry Av [0] is set to
zero. Computing the minimum (rather than maximum) number of black nodes is done
similarly.
Throughout the execution, we also maintain a global array A such that A[i] stores
the maximum Av [i] over all nodes v considered so far. Notice that at the end of the
execution, A[i] holds the desired value i max since every connected subgraph of T of
size i includes some node v and i − 1 nodes in Tv .
We now show how to compute Av [i] for a node v and a specific value i ∈
{1, . . . , |Tv |}. If v has a single child u, then v is necessarily not a dummy node and we set
Av [i] = col(v) + Au [i − 1], where col(v) = 1 if v is black and col(v) = 0 otherwise.
If v has two children u and w, then any pattern of size i that appears in Tv and includes
v is composed of v, a pattern of size in Tu that includes u, and a pattern of size i −1−
in Tw that includes w. Notice that we allow = 0 which corresponds to a pattern that
does not include u, and we allow i −1− = 0 which corresponds to a pattern that does
not include w. We therefore set Av [i] = col(v) + max0≤≤i−1 {Au [] + Aw [i − 1 − ]}
and Av [i] = max1≤≤i−1 {Au [] + Aw [i − 1 − ]} when v is a dummy node. Observe
that in the latter the index starts with 1 to indicate that the non-dummy copy of v
(i.e., v1 ) must be included in the pattern.
We next analyze and then improve the above algorithm (first by one log factor and
then by another log factor). In the rest of this section, like the above algorithm, all of
our algorithms will compute the Av arrays for each v in T in a bottom-up fashion. In all
these algorithms, just like in the above algorithm, special attention has to be given to the
case where v is a dummy node. Handling dummy nodes is done similarly to the above.
To make the presentation simpler we will assume that there are no dummy nodes at all.
Lemma 2 The above algorithm runs in O(n 2 ) time.
Proof The computation done on nodes with one child requires O(n) time, hence the
total time required to compute all arrays Av for such nodes is O(n 2 ). The time required
to compute
all arrays for nodes with two children is asymptotically bounded by the
sum v α(v)β(v), where α(v) and β(v) denote the sizes of the two subtrees rooted
at each of the children of v, and the sum is taken over all nodes v with two children.
For a tree rooted at r , we let cost (r ) denote this sum over all nodes in Tr and argue
by induction that cost (r ) is bounded by |Tr |2 = O(n 2 ).
Let r be the root of a tree with n nodes, and let u and v denote the two children of r .
Let x denote the size of the subtree rooted at u. Then x < n, and the size of the subtree
rooted at v is n−1−x. By induction, we have cost (u) ≤ x 2 and cost (v) ≤ (n−1−x)2 .
Thus, cost (r ) = x(n − 1 − x) + cost (u) + cost (v) < n 2 − x(n − x) ≤ n 2 .

Note that if at any time the algorithm only stores arrays Av which are necessary
for future computations, then the total space used by the algorithm is O(n). The space
can be made O(n) bits by storing the Av arrays in a succinct fashion (this will also
123
576 Algorithmica (2015) 73:571–588
prove useful later for improving the running time): Observe that Av [i + 1] is either
equal to Av [i] or to Av [i] + 1. This is because any pattern of size i with b black
nodes can be turned into a pattern of size i − 1 with at least b − 1 black nodes by
removing a leaf. We can therefore represent Av as a binary string Bv of n + 1 bits,
where Bv [0] =0, and Bv [i] = Av [i] − Av [i − 1] for all i = 1, . . . , n. Notice that
since Av [i] = i=0 Bv [], each entry of Av can be retrieved from Bv in O(1) time
using rank queries [24,29].
2.2 Pattern Matching
Before improving the above algorithm, we show that it can already be analyzed more
carefully to get a bound of O(n · i) when the pattern size is known to be at most i.
This means that in O(n) space and O(n · i) construction time we can build an index
that answer queries in O(1) time provided the pattern size is bounded by i. It is also
useful for the pattern matching problem: without preprocessing, decide whether a
given pattern (i, j) appears in T .
In the case of strings, this problem can trivially be solved in O(n) time by sliding
a window of length i through the string thus effectively considering every substring
of length i. This sliding-window approach however does not extend to trees since we
cannot afford to examine all connected subgraphs of T . We next show that, in trees,
searching for a pattern of size i can be done in O(n · i) time by using our above
indexing algorithm. This is useful when the pattern is small (i.e., when i = o(n)).
Obtaining O(n) time remains our main open problem.
Lemma 3 Given a tree T with n nodes that are colored black or white and a query
pattern (i, j), we can check in O(n · i) time and O(n) space if T contains the pattern
(i, j).
Proof In our indexing algorithm, every node v computes an array Av of size |Tv |. When
the pattern size is known to be i we can settle for an array Av of size min{|Tv |, i}. Recall
from the above discussion that we can assume T is a binary tree. Consider some node v
that has only one child u. We can compute Av from Au in time O(min{|Tv |, i}) = O(i).
Summing over all such nodes v gives at most O(n · i). If on the other hand, node v
has two children u and w then Av is computed from Au and Aw in O(min{|Tu |, i} ·
min{|Tw |, i}) time. We claim that summing this term over all nodes in T that have two
children gives O(n · i).
To see this, first consider the subset of nodes V = {v ∈ T : |Tv | <
i and |T par ent (v) | ≥ i}, where par ent (v) denotes the parent of v in T . Notice that
each subtree Tv ∈ {Tv : v ∈ V } is of size less than i and that these subtrees are
disjoint. By the proof of Lemma 2 we know that computing Av (along with every Au
for vertices u ∈ Tv ) is done in O(|Tv |2 ) time. The total time to compute Av for all
nodes v ∈ V and their descendants is therefore cost (v) = v∈V |Tv |2 . Since every
|Tv | < i and v∈V |Tv | ≤ n, we have that cost (v) is upper bounded by O(n · i) that
is achieved when all |Tv |s are equal to i and |V | = n/i.
The remaining set of nodes S consists of all nodes v such that v has two children
u, w and |Tv | ≥ i. We partition these nodes into S1 = {v ∈ S : |Tu | ≥ i and |Tw | ≥ i}
123
Algorithmica (2015) 73:571–588 577
and S2 = S \ S1 . Notice that |S1 | = O(n/i). Therefore, computing Av for all nodes
v ∈ S1 can be done in O(|S1 | · i 2 ) = O(n · i) time. We are left only with the vertices
of S2 . These are all vertices v such that at least one of their children is in V . Denote
this child as d(v). Computing Av for all nodes in S2 can therefore be done in time

O(|Td(v) | · i) = i · O(|Td(v) |) = i · O(|Tu |) = O(i · n).
v∈S2 v∈S2 u∈V

2.3 An Improved Index
In this subsection, we will gradually improve the construction time from O(n 2 ) to
O(n 2 / log2 n). For simplicity of the presentation, we will assume the input tree T is a
rooted binary tree. This extends to arbitrary trees using a similar dummy-nodes trick
as above.
2.3.1 From Trees to Strings
Recall that we can represent every Av by a binary string Bv of n + 1 bits where Bv [0]
is always zero and for i = 1, . . . , n, Bv [i] = Av [i] − Av [i − 1]. We begin by showing
that if v has two children u, w then the computation of Bv can be done by solving
a variant of jumbled pattern matching on the string Sv = X v ◦ col(v) ◦ Yv (here ◦
denotes concatenation) of length |Sv | = |Tu | + |Tw | + 1, where X v is obtained from
Bu by reversing it and removing its last bit, and Yv is obtained from Bw by removing
its first bit. We call the position in Sv with col(v) the split position of Sv . Recall that
Av [i] = col(v) + max0≤≤i−1 {Au [] + Aw [i − 1 − ]}. This is equal to the maximum
number of 1s in a window of Sv that is of length i and includes the split position of Sv .
We are therefore interested only in windows including the split position, and this
is the important distinction from the standard jumbled pattern matching problem on
strings. Clearly, using the fastest O(n 2 / log2 n)-time algorithm [28] for the standard
string problem we can also solve our problem and compute Av in O(|Sv |2 / log2 n)
time. However, recall that for our total analysis (over all nodes v) to give O(n 2 / log2 n)
we need the time to be O(|X v | · |Yv |/ log2 n) and not O((|X v | + |Yv |)2 / log2 n).
2.3.2 First Speedup
The O(log2 n)-factor speedup for jumbled pattern matching on strings [28] is achieved
by a clever combination of lookup tables. One log factor is achieved by computing
the maximum number of 1s in a window of length i only when i is a multiple of
s = (log n)/6. Using a lookup table over all possible pairs of length-s windows and
all values between 1 and s, a sliding window of size i can be extended in O(1) time to all
windows of sizes i +1, . . . , i +s −1 that start at the same location (see [28] for details).
Their algorithm can output in O(n 2 / log n) time an array of O(n/ log n) words. For
each i that is a multiple of s, the array keeps one word storing the maximum number of
123
578 Algorithmica (2015) 73:571–588
1s over all windows of length i and another word storing the binary increment vector
for the maximum number of 1s in all windows of length i + 1, . . . , i + s − 1.
By only considering windows that include the split position of Sv , this idea easily
translates to an O(|X v |·|Yv |/ log n)-time algorithm to compute Av and implicitly store
it in O((|X v | + |Yv |)/ log n) words. From this it is also easy to obtain an O((|X v | +
|Yv |)/ log n)-words representation of Bv . Notice that if v has a single child then the
same procedure works with |X v | = 0 in time O(|Yv |/ log n) = O(n/ log n). Summing
over all nodes v, we get an O(n 2 / log n)-time solution for binary jumbled indexing
on trees.
2.3.3 Second Speedup
In strings, an additional logarithmic improvement shown in [28] can be obtained as

follows: When sliding a window of length i (i is a multiple of s) the window is shifted s
locations in O(1) time using a lookup table over all pairs of binary substrings of length s
(representing the leftmost and rightmost bits in all these s shifts). This further improve-
ment yields an O(n 2 / log2 n)-time algorithm for strings. In trees however this is not the
case. While we can compute Av in O((|X v | + |Yv |)2 / log2 n) time, we can guarantee
O(|X v | · |Yv |/ log2 n) time only if both |X v | and |Yv | are greater than s. Otherwise, say
|X v | < s and |Yv | ≥ s, we will get O(|X v | · |Yv |/|X v | log n) = O(|Yv |/ log n) time.
This is because our windows must include the col(v) index and so we never shift a win-
dow by more than |X v | locations. Overcoming this obstacle is the main challenge of this
subsection. It is achieved by carefully ensuring that the O(|Yv |/ log n) = O(n/ log n)
costly constructions will be done only O(n/ log n) times.
2.3.4 A Micro–Macro Decomposition
A micro–macro decomposition [1] is a partition of T into O(n/ log n) disjoint con-

nected subgraphs called micro trees. Each micro tree is of size at most log n, and at
most two nodes in a micro tree are adjacent to nodes in other micro trees. These nodes
are referred to as top and bottom boundary nodes. The top boundary node is chosen
as the root of the micro tree. The macro tree is a rooted tree of size O(n/ log n) whose
nodes correspond to micro trees as follows (See Fig. 1): The top boundary node t (C)
of a micro tree C is connected to a boundary node in the parent micro tree (apart from
the root). The boundary node t (C) might also be connected to a top boundary node
of a child micro tree child(C).1 The bottom boundary node b(C) of C is connected
to top boundary nodes of at most two child micro trees (C) and r (C) of C. Such a
micro–macro decomposition can be found in linear time [1].
2.3.5 A Bottom up Traversal of the Macro Tree
With each micro tree C we associate an array AC . Let TC denote the union of micro
tree C and all its descendant micro trees (including the edges between them). The array
1 The root of the macro tree is an exception as it might have a top boundary node connected to two (rather
than one) child micro trees. We focus on the other nodes. Handling the root is done in a very similar way.
123
Algorithmica (2015) 73:571–588 579
Fig. 1 A micro tree C and its t(C)

neighboring micro trees in the
macro tree. Inside each micro b(C) C
tree, the solid nodes correspond
to boundary nodes and the
hollow nodes to non-boundary child (C)
nodes
l (C) r (C)
AC stores the maximum number of 1s (black nodes) in every pattern that includes the
boundary node t (C) and other nodes of TC . We also associate three auxiliary arrays:
Ab , At and Atb and set Ab [0] = At [0] = Atb [0] = Atb [1] = 0. The array Ab stores
the maximum number of 1s in every pattern that includes the boundary node b(C) and
possibly other nodes of C, T(C) , and Tr (C) . The array At stores the maximum number
of 1s in every pattern that includes the boundary node t (C) and possibly other nodes
of C and Tchild(C) . Finally, the array Atb stores the maximum number of 1s in every
pattern that includes both boundary nodes t (C) and b(C) and possibly other nodes of
C, T(C) , and Tr (C) .
We initialize for every micro tree C its O(|C|) = O(log n) sized arrays. Arrays AC
and At are initialized to hold the maximum number of 1s in every pattern that includes
t (C) and nodes of C. For each C, the required values are calculated in O(|C|2 ) time
as a by-product of running the algorithm from the previous subsection after rooting
C at t (C). Similarly, we initialize the array Ab to hold the maximum number of 1s
in every pattern that includes b(C) and nodes of C. The array Atb is initialized as
follows: First we check how many nodes are 1s and how many are 0s on the unique
path between t (C) and b(C). If there are i 1s and j 0s we set Atb [k] = 0 for every
k < i + j and we set Atb [i + j] = i. We compute Atb [k] for all k > i + j in total
O(|C|2 ) time by contracting the b(C)-to-t (C) path into a single node and running the
previous algorithm rooting C in this contracted node. The total running time of the
initialization step is therefore O(n · |C|2 / log n) = O(n log n) which is negligible.
Notice that during this computation we have computed the maximum number of 1s
in all patterns that are completely inside a micro tree. We initialize the array A (that
is, only the first O(log n) entries of A) with these values. In particular, this takes care
of all patterns that do not contain any boundary node. We are now done with the leaf
nodes of the macro tree.
We next describe how to compute the arrays of an internal node C of the macro tree
given the arrays of (C), r (C) and child(C). We first compute the maximum number
of 1s in all patterns that include b(C) and possibly other vertices of T(C) and Tr (C) . This
can be done using the aforementioned string speedups in O(|T(C) | · |Tr (C) |/ log2 n)
time when both |T(C) | > log n and |Tr (C) | > log n and in O(n/ log n) time otherwise.
Using this and the initialized array Ab of C (that is of size |C| ≤ log n) we can compute
the final array Ab of C. This is done by using the aforementioned string algorithm (on
a string S of length |T(C) | + |Tr (C) | + 1 + |C|) restricted to the case where windows
must include the split position (the split position separates S to a substring of length
123
580 Algorithmica (2015) 73:571–588
|T(C) | + |Tr (C) | + 1 and a substring of length |C| ≤ log n). Using only the first
speedup, this takes time O(|S|/ log n) = O(n/ log n). Similarly, using the initialized
Atb of C, we can compute the final array Atb of C in O(n/ log n) time.
Next, we compute the array At using the initialized array At of C and the array
At of child(C) in time O(n/ log n). Finally, we compute AC of C using Atb of C
and At of child(C) in O((|T(C) | + |Tr (C) | + 1 + |C|) · |Tchild(C) |/ log2 n) time if
both |T(C) | + |Tr (C) | + 1 + |C| > log n and |Tchild(C) | > log n and in O(n/ log n)
otherwise. To finalize AC we must then take the entry-wise maximum between the
computed AC and At . This is because a pattern in TC may or may not include b(C).
Finally, once AC is computed, we update the global array A accordingly (by taking
the entry-wise maximum between A and AC ).
To bound the total time complexity over all clusters C, notice that some computa-
tions required O(α(v)·β(v)/ log2 n) when α(v) > log n and β(v) > log n are the sub-
tree sizes of two children of some node v ∈ T . We have already seen that the sum of all
these terms over all nodes of T is O(n 2 / log2 n). The other type of computations each
require O(n/ log n) time but there are at most O(n/ log n) such computations (O(1)
for each micro tree) for a total of O(n 2 / log2 n). This completes the proof of Theorem 1.
2.4 Finding the Query Pattern
In this subsection we extend the index so that on top of identifying in O(1) time if
a pattern (i, j) appears in T , it can also locate in O(log n) time a node v ∈ T that
is part of such a pattern appearance. We call this node an anchor of the appearance.
This extension increases the space of the index from O(n) bits to O(n log n) bits (i.e.,
O(n) words).
Recall that given a tree T we build in O(n 2 / log2 n) time an array A of size n = |T |
where A[i] stores the minimum and maximum values i min and i max such that (i, i min )
and (i, i max ) appear in T . Now consider a centroid decomposition 2 of T : A centroid
node c in T is a node whose removal leaves no connected component with more than
n/2 nodes. We first construct the array A of T in O(n 2 / log2 n) time and store it in
node c. We then recurse on each remaining connected component. This way, every
node v ∈ T will compute the array corresponding to the connected component whose
centroid was v. Notice that this array is not the array Av since we do not insist the pattern
uses v. Observe that since each array A is implicitly stored in an n-sized bit array B,
and since the recursion tree is balanced the total space complexity is O(n log n) bits.
Furthermore, since every node in T has degree at most three, removing the centroid
leaves at most three connected components and so the time to construct all the arrays is
bounded by T (n) = T (n 1 ) + T (n 2 ) + T (n 3 ) + O(n 2 / log2 n) where n 1 + n 2 + n 3 = n
and every n i ≤ n/2. This yields the time complexity T (n) = O(n 2 / log2 n).
Let c denote the centroid of T whose removal leaves at most three connected
components T1 , T2 , and T3 (recall we assume degree at most 3). Upon query (i, j) we
first check the array of c if pattern (i, j) appears in T (i.e., if i min ≤ j ≤ i max ). If it
does then we check the centroids of T1 , T2 and T3 . If (i, j) appears in any of them then
2 A centroid decomposition can be found in linear time.
123
Algorithmica (2015) 73:571–588 581
we continue the search there. This way, after at most O(log n) steps we reach the first
node v whose connected component includes (i, j) but none of its child components
do. We return v as the anchor node since such a pattern must include v. We note that
the above can be extended so that for every occurrence of (i, j) one node that is part
of this occurrence is reported. Finally, we note that it was recently observed in [15]
that if we are willing to settle for an index of size O(n 2 ) then we can locate the entire
match (not just an anchor) in time proportional to the size of the match.
3 Jumbled Pattern Matching on Grammars
In grammar-based compression, a binary string S of length n is compressed using a

context-free grammar G(S) in Chomsky normal form that generates S and only S. Such
a grammar has a unique parse tree that generates S. Identical subtrees of this parse
tree indicate substring repeats in S. The size of the grammar g = |G(S)| is defined
as the total number of variables and production rules in the grammar. Note that g can
be exponentially smaller than n = |S|. We show how to solve the jumbled pattern
matching problem on S by solving it on the parse tree of G(S), taking advantage of
subtree repeats. We obtain the following bounds:
Theorem 2 Given a binary string S of length n compressed by a context free grammar

G(S) of size g in Chomsky normal form, we can construct in O(g 2/3 n 4/3 /(log n)4/3 )
time a data structure of size O(n) bits that on query (i, j) determines in O(1) time if
S has a substring of length i with exactly j 1s.
Proof We will show how to compute the array A such that A[i] holds the maximum
number of 1s in a substring of S of size i. The minimum is found similarly. We use a
recent result of Gawrychowski [22] who showed how for any , we can modify G(S)
in O(n) time by adding O(g) new variables such that every new variable generates
a string of length at most , and S can be written as the concatenation of substrings
generated by these O(g) new variables. Thus, we can write S as the concatenation of
blocks S = B1 ◦ · · · ◦ Bb with b = O(n/) and |B j | ≤ , such that amongst these
blocks there are only d = O(g) distinct blocks B1∗ , . . . , Bd∗ . We refer to these d blocks
as basic blocks. For each basic block Bk∗ , 1 ≤ k ≤ d, we first build an array A∗k where
A∗k [i] stores the maximum number of 1s over all substrings of Bk∗ of length i. This is
done in O(2 / log2 n) time per block (by using the algorithm of [28] for strings) for
a total of O(g · 2 / log2 n).
We next handle substrings that span over two adjacent blocks. Namely, for each
possible pair of basic blocks Bk∗ and Bm∗ , 1 ≤ k ≤ m ≤ d, we build an array A∗k,m
where A∗k,m [i] stores the maximum number of 1s over all substrings of Bk∗ ◦ Bm∗ of
length i that start in Bk∗ and end in Bm∗ . This is done in O(2 / log2 n) time for each
pair for a total of O(g 2 2 / log2 n). Recall that, since we use the algorithm of [28], the
array A∗k,m is implicitly represented by an array of O(/ log n) words: For each i that
is a multiple of log n, the array keeps one word storing the maximum number of 1s
over all substrings of length i, and another word storing the binary increment vector
for substrings of length i + 1, . . . , i + log n − 1.
123
582 Algorithmica (2015) 73:571–588
i + ik,m
B1 Bk Sk,m Bm Bb
l l |Sk,m| = ik,m l l
Fig. 2 A string S partitioned into blocks B1 ◦ · · · ◦ Bb , each of length at most . The shaded substring is
one of the substrings considered for Ak,m [i] as it includes the substring Sk,m of length i k,m and a prefix
and suffix of total length i
Finally, we consider substrings that span over more than two blocks. For each pair
of (non-basic) blocks Bk and Bm , 1 ≤ k < m ≤ b, let Sk,m = Bk+1 ◦ · · · ◦ Bm−1
be a substring of S that is of length i k,m = |Sk,m | and has jk,m 1’s. Note that we can
easily compute i k,m and jk,m of all 1 ≤ k < m ≤ b in total time O(n 2 /2 ). For
every 1 ≤ k < m ≤ b, we build an array Ak,m of size O() where Ak,m [i] stores the
maximum number of 1s over all substrings of Bk ◦ · · · ◦ Bm of length i +i k,m that start
in Bk and end in Bm . Notice that all such substring include Sk,m as well as a suffix of
Bk and a prefix of Bm whose total length is i (see Fig. 2). Therefore, for each Ak,m we
set Ak,m [i] to be jk,m plus the maximal number of 1’s in a suffix of Bk and a prefix of
Bm whose total length is i. In other words, we set Ak,m [i] = jk,m + A∗k ,m [i] where k
(resp. m ) is such that the block Bk (resp. Bm ) corresponds to the basic block Bk∗ (resp.
Bm∗ ). The computation of (an implicit representation of) each Ak,m can be done in
O(/ log n) time by only setting Ak,m [i] for i’s that are multiples of log n (the binary
increment vectors of Ak,m remain as in A∗k ,m ). Since there are O((n/)2 ) pairs of
blocks and each pair requires O(/ log n) time, we get a total of O(n 2 /( log n)) time.
Finally, once we have the implicit representation of all Ak,m ’s we can compute the
desired array A from them in O(n 2 /( log n)) time: For each i that is a multiple of
log n and each Ak,m we set A[i] to be the maximum of A[i] and Ak,m [i − i k,m ] in
O(1) time. The next log n entries of A are computed in O(1) time (as done in [28],
Section 3.3) from the increment vectors of A[i] and Ak,m [i − i k,m ]. To conclude, we
get a total running time of O(g 2 2 / log2 n + n 2 /( log n)) = O(g 2/3 n 4/3 /(log n)4/3 )
when is chosen to be (n/g)2/3 (log n)1/3 .

We also note that similarly to the case of trees (Subsect. 2.4), if we are willing to
increase our index space to O(n log n) bits, then it is not difficult to turn indexes for
detecting jumbled pattern matches in grammars into indexes for locating them. To
obtain this, we build an index for S and recurse (build indexes) on S1 = B1 ◦ · · · ◦ Bk
and S2 = Bk+1 ◦ · · · ◦ Bd where |S1 | and |S2 | are roughly n/2. This way, like in the
centroid decomposition for trees, we can get in O(log n) time an anchor index of S.
That is, an index of S that is part of a pattern appearance. Furthermore, as opposed to
trees, we can then find the actual appearance (not just the anchor) in additional O(i)
time by sliding a window of size i that includes the anchor.
4 Jumbled Pattern Matching on Bounded Treewidth Graphs
In this section we consider the extension of binary jumbled pattern matching to

the domain of graphs: Given a graph G whose vertices are colored either black
123
Algorithmica (2015) 73:571–588 583
or white, and a query (i, j), determine whether G has a connected subgraph G
with i white vertices and j black vertices3 . This problem is also known as the
(binary) graph motif problem in the literature. Fellows et al. [21] provided an
n O(w) algorithm for this problem, where w is the treewidth of the input graph.
Here we will substantially improve on this result by proving the following theo-
rem, asserting that the problem is fixed-parameter tractable in the treewidth of the
graph.
Theorem 3 Binary jumbled pattern matching can be solved in f (w) · n O(1) time on
graphs of treewidth w. The function f (w) can be bounded by w O(w) in case a tree
decomposition of width w (see below) is provided with the input graph, and otherwise
f (w) = 2 O(w ) .
3
Note that the algorithm in the theorem actually computes all queries (i, j) that appear
in G, and can thus be easily converted to an index for the input graph.
4.1 Tree Decompositions
We begin by first introducing some necessary notation and terminology. Let G =

(V (G), E(G)) be a graph. A tree decomposition of G is defined by a rooted tree T
whose nodes are subsets of V (G), called bags, with the following two properties: (i)
the union of all subgraphs induced by the bags of T is G, and (ii) for any vertex
x ∈ V (G), the set of all bags including x induces a connected subgraph in T . We use
X to denote the set of bags in a given tree decomposition. The width of the decom-
position is defined as max X ∈X |X | − 1. The treewidth of G is the smallest possible
width of any tree decomposition of G. Given a bag X of a given tree decomposition
T , we let G X denote the subgraph induced by the union of all bags in T X . Bodlaen-
der [8] gave an algorithm for computing a width-w tree decomposition of a given
graph with treewidth w in 2 O(w ) n time. We refer readers interested in further details
3
to [20].
We will work with a specific kind of tree decompositions, namely nice tree decom-
positions [9]. A nice tree decomposition is a binary tree decomposition T with four
types of bags: Leaf, forget, introduce, and join. Leaf bags are the leaves of T and are
singleton sets which include a single vertex of G. A forget bag X has one child Y such
that X = Y \ {x} for some vertex x of G. Thus, X forgets the vertex x. Similarly, an
introduce bag X has one child Y such that X = Y ∪ {x} for some vertex x ∈ / Y of G.
In this case, we say X introduces the vertex x. Finally a join bag X has two children
Y and Z in T with X = Y = Z . It is well known that given a tree decomposition
of any graph, one can compute in polynomial-time a nice tree decomposition of the
same graph with equal width and with an at most linear increase in its number of
nodes [9]. Thus, from this point onwards we may assume that we are given a nice tree
decomposition T of G with width w.
3 The difference between the meaning of the query here and elsewhere in the paper is for ease of the
presentation.
123
584 Algorithmica (2015) 73:571–588
for the query (7, 6) in G X , where X = {1, 2, 3, 4, 5, 6}. The partition is defined
Fig. 3 A positive partition
by {2}, {1, 4}, {5}, {3, 6} . Notice that vertex 2 is not in any of the connected graphs witnessing the partial
occurrence. Also note that these graphs may or may not have edges between them
4.2 Positive Partitions
We next describe the main data structure that we compute in our algorithm. Let X be
an arbitrary bag. A partition4 Π X = {X 0 , X 1 , . . . , X x } of X is positive for a given
query (i, j) in G X if there are x disjoint connected subgraphs G 1 , . . . , G x of G X such
that (1) the total number of black (resp. white) vertices in G = G 1 ∪ · · · ∪ G x is
i (resp. j), and (2) V (G ) ∩ X 0 = ∅ and V (G ) ∩ X = X for each = 1, . . . , x
(see Fig. 3). Thus, positive partitions capture partial occurrences that intersect X at
exactly X \ X 0 . These may not be actual occurrences as we do not require any edges
between the different G i ’s, and so G itself may not be connected. We let A X [i, j]
denote the set of all positive partitions for a query (i, j), and let A X denote the array
with an entry for each possible query (i, j). We will require that the trivial partition
where X 0 = X is only positive for the query (0, 0).
Note that by definition, a query (i, j) appears in G X iff there exists some partition
into two sets {X 0 , X 1 } that is positive for (i, j) in G X . Since (i, j) appears in G iff
(i, j) appears in G X for some bag X ∈ X , this means that it is also positive for
(i, j) in some G X . Thus, computing the arrays A X for all bags X ∈ X suffices for
solving our problem. We do this by computing all arrays A X in a bottom-up fashion
from the leaves to the root of T . Note that the size of each array A X can easily be
bounded by w O(w) n 2 , considering that the w’th Bell number is bounded by w O(w) .
Thus, to get a similar term in our running time, we will show that computing the
array A X from the arrays of the children of X can be done in polynomial-time. The
computation on leaf bags is trivial (as they are singletons), and the computation on
forget nodes is almost equally easy: If X is a forget bag with child Y , then computing
4 Here we slightly abuse our terminology and allow X to be the empty set.
0
123
Algorithmica (2015) 73:571–588 585
A X from AY in this case amounts to converting each positive partition ΠY of Y to a

corresponding positive partition Π X of X by removing x, the vertex forgotten by X ,
from the class it belongs to in ΠY . We thus focus below on introduce nodes and join
nodes.
4.3 Introduce Nodes
Let X be an introduce bag with child Y in T , and let x be the vertex introduced by
X . Let us assume for ease of presentation that x is colored white (the case where
it is colored black is symmetric). By the properties of a tree decomposition, we
know that x is only adjacent to vertices y ∈ Y in G X [20]. Let y1 , . . . , y denote
these neighbors of x, and let G kX denote the graph obtained by deleting the edges
{x, yk+1 }, . . . , {x, y } from G X for each k = 0, . . . , (G X = G X ). Similarly, let
AkX [i, j] denote the set of all positive partitions of (i, j) in G kX . We will compute
A0X [i, j] from AY , and AkX [i, j] from Ak−1X [i, j] for each k > 0. Finally, we will set
A X [i, j] = AX [i, j].
We begin with k = 0. In this case, x is an isolated vertex in G 0X . Hence, there are
only two types of positive partitions of (i, j) in G X :
– A partition Π X obtained by taking Π X = ΠY ∪ {{x}} for some ΠY ∈ AY [i − 1, j]
(thus, {x} is a singleton set in Π X ).
– A partition Π X obtained by taking a partition ΠY ∈ AY [i, j] and adding x to
Y0 ∈ ΠY (thus, x is in the set of vertices not included in the partial occurrence
captured by Π X ).
It is easy to see that since x is an isolated vertex the above description indeed captures
all types of positive partitions for G 0X , and so we can compute A0X [i, j] from AY in
polynomial time.
Assume that k > 0. Then any positive partition for (i, j) in G k−1
X is also positive in
G X . Moreover, the only new positive partitions for (i, j) in G kX that were not positive
k
in G k−1
X are partitions where x and yk belong to the same class (although, there might
be partitions of this type which were positive in G k−1X ). Thus, we compute A X [i, j] by
k
k−1 k−1
first setting A X [i, j] = A X [i, j]. Then for each Π ∈ A X [i, j] with x ∈ X i ∈ Π
k

and yk ∈ X j ∈ Π , i = j, we add the partition Π \ {X i , X j } ∪ {X i ∪ X j } to AkX [i, j]
(assuming it is not already there). The total amount of computation time required here
is obviously polynomial in the size of Ak−1 X [i, j].
4.4 Join Nodes
Consider a join bag X with two children Y and Z in T , and recall that X = Y = Z .
For a pair of partitions ΠY = {Y0 , . . . , Y y } and Π Z = {Z 0 , . . . , Z z } of Y and Z , we
define the partition ΠY ⊕ Π Z (the join of ΠY and Π Z ) as follows: Consider the two
equivalence relations induced by Π X \ X 0 and ΠY \ Y0 on the set Π X \ X 0 . Then
the equivalence relation defined by Π X \ X 0 is taken as the transitive closure of the
union of these two equivalence relations. As an example, suppose that X = Y = Z =
123
586 Algorithmica (2015) 73:571–588
{1, . . . , 6}, ΠY = {{1}, {2}, {3, 4}, {5, 6}}, and Π Z = {{2}, {1}, {3, 5}, {4, 6}}. Then
Π X = {X 0 , X 1 }, where X 0 = {1, 2} and X 1 = {3, 4, 5, 6}.
Let i 0 and j0 respectively denote the number of white and black vertices in X \
X 0 . We claim that if (i 1 , j1 ) and (i 2 , j2 ) are two queries for which ΠY and Π Z are
respectively positive in G Y and G Z , then Π X = ΠY ⊕ Π Z is positive for (i 1 + i 2 −
i 0 , j1 + j2 − j0 ). This can be verified by considering the connected components in the
graph G X = G 1Y ∪ · · · ∪ G Yy ∪ G 1Z · · · ∪ G zZ , where G 1Y , . . . , G Yy and G 1Z , . . . , G zZ are
sets of graphs witnessing that ΠY and Π Z are positive for (i 1 , j1 ) in G Y and (i 2 , j2 )
in G Z . It is easy to see that the total number of white and black vertices in these
components is i = i 1 + i 2 − i 0 and j = j1 + j2 − j0 , where i 0 white vertices and j0
black vertices are subtracted due to double counting the vertex colors in X . Moreover,
it can be verified that these components intersect X as required by Π X , due to the fact
that Π X is the transitive closure of ΠY ∪ Π Z . Thus, G X is a partial occurrence of (i, j)
in G X , and Π X ∈ A X [i, j].
On the other hand, it can also be seen on the same lines that if (i, j) is a query
for which Π X is positive in G X , then it is either in ΠY [i, j] or Π Z [i, j], or we have
(i, j) = (i 1 + i 2 − i 0 , j1 + j2 − j0 ) for some pair of queries (i 1 , j1 ) and (i 2 , j2 ) for
which ΠY and Π Z are positive in G Y and G Z . We can therefore compute A X [i, j]
by first setting A X [i, j], and then examining all pairs (i 1 , j1 ) and (i 2 , j2 ) as above.
For each such pair, we compute all partitions ΠY ⊕ Π Z for ΠY ∈ AY [i 1 , j1 ] and
Π Z ∈ A Z [i 2 , j2 ]. Note that the entire computation of A X requires time which is
polynomial in the total sizes of AY and A Z .
4.5 Summary
We have shown above how to compute, for any given query (i, j), the array A X for
each bag X of T in w O(w) n O(1) time. As the total number of bags is O(n), we obtain
an algorithm whose total running time is w O(w) n O(1) , excluding the time required to
compute the nice tree decomposition T . This completes the proof of Theorem 3. We
note that our algorithm straightforwardly extends to an w O(w) n O(c) time algorithm
for the case where the vertices of G are colored with c colors.
5 Conclusions and Open Problems
In this paper we considered the binary jumbled pattern matching problem on trees,
bounded treewidth graphs, and strings compressed by context-free grammars in
Chomsky normal form. We gave an Õ(g 2/3 n 4/3 )-time solution for strings of length
n represented by grammars of size g, an f (w) · n O(1) -time solution for graphs
with treewidth w, and an O(n 2 / log2 n)-time solution for trees. In the latter result,
we showed how to determine in O(1) time if a query pattern appears, and how
to locate in O(log n) time a node of this appearance. With a linear-space solu-
tion, locating the entire appearance remains an open problem. Using Lemma 3,
the construction time for trees can be made O(n · i/ log2 n) if the query patterns
are known to be of size at most i. We also note here that the construction time
123
Algorithmica (2015) 73:571–588 587
can be made faster on trees that have many identical rooted subtrees5 . This is
because the bottom-up construction does not need to be applied on the same subtree
twice.
Finally, the main open problems stemming from our work are: (1) To obtain a
faster construction of the linear-space index for strings. Our index for trees implies
that any construction speedup for strings implies a construction speedup for trees.
(2) To develop an algorithm for the non-indexing variant of binary jumbled pattern
matching on trees whose performance is closer to the performance of the corresponding
algorithm on strings (i.e. the O(n) sliding window algorithm).
Acknowledgments We thank the anonymous reviewers for their helpful comments.
References
1. Alstrup, S., Secher, J., Sporkn, M.: Optimal on-line decremental connectivity in trees. Inf. Process.
Lett. 64(4), 161–164 (1997)
2. Ambalath, A.M., Balasundaram, R., Rao, C.H., Koppula, V., Misra, N., Philip, G., Ramanujan, M.S.:
On the kernelization complexity of colorful motifs. In: Proceedings of the 5th International Symposium
Parameterized and Exact Computation, pp. 14–25, (2010)
3. Amir, A., Chan, T.M., Lewenstein, M., Lewenstein, N.: On hardness of jumbled indexing. In: Proceed-
ings of the 41st International Colloquium on Automata, Languages and Programming (ICALP), pp.
114–125, (2014)
4. Badkobeh, G., Fici, G., Kroon, S., Lipták, Z.: Binary jumbled string matching for highly run-length
compressible texts. Inf. Process. Lett. 113(17), 604–608 (2013)
5. Benson, G.: Composition alignment. In: Proceedings of the 3rd International Workshop on Algorithms
in Bioinformatics (WABI), pp. 447–461, (2003)
6. Betzler, N., van Bevern, R., Fellows, M.R., Komusiewicz, C., Niedermeier, R.: Parameterized algorith-
mics for finding connected motifs in biological networks. IEEE/ACM Trans. Comput. Biol. Bioinform.
8(5), 1296–1308 (2011)
7. Böcker, S.: Simulating multiplexed SNP discovery rates using base-specific cleavage and mass spec-
trometry. Bioinformatics 23(2), 5–12 (2007)
8. Bodlaender, H.L.: A linear time algorithm for finding tree-decompositions of small treewidth. SIAM
J. Comput. 25, 1305–1317 (1996)
9. Bodlaender, H.L.: Treewidth. Algorithmic techniques and results. In: Proceedings of the 22nd Inter-
national Symposium on Mathematical Foundations of Computer Science (MFCS), pp. 19–36, (1997)
10. Burcsi, P., Cicalese F., Fici G., Lipták, Z.: On table arrangement, scrabble freaks, and jumbled pattern
matching. In: Proceedings of the Symposium on Fun with Algorithms, pp. 89–101, (2010)
11. Burcsi, P., Cicalese, F., Fici, G., Lipták, Z.: Algorithms for jumbled pattern matching in strings. Int. J.
Found. Comput. Sci. 23(2), 357–374 (2012)
12. Burcsi, P., Cicalese, F., Fici, G., Lipták, Z.: On approximate jumbled pattern matching in strings.
Theory Comput. Syst. 50(1), 35–51 (2012)
13. Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest
grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005)
14. Cicalese, F., Fici, G., Lipták, Z.: Searching for jumbled patterns in strings. In: Proceedings of the
Prague Stringology Conference, pp. 105–117, (2009)
15. Cicalese, F., Gagie, T., Giaquinta, E., Laber, E., Liptak, S., Rizzi, R., Tomescu, A.I.: Indexes for jumbled
pattern matching in strings, trees and graphs. In: Proceedings of the 20th International Symposium on
String Processing and Information Retrieval (SPIRE), pp. 56–63, (2013)
5 There is a simple linear-time algorithm that given a tree finds the best way to share identical rooted
subtrees [19].
123
588 Algorithmica (2015) 73:571–588
16. Cicalese, F., Laber, E.S., Weimann, O., Yuster, R.: Near linear time construction of an approximate
index for all maximum consecutive sub-sums of a sequence. In: Proceedings of the Symposium on
Combinatorial Pattern Matching, pp. 149–158, (2012)
17. Dondi, R., Fertin, G., Vialette, S.: Complexity issues in vertex-colored graph pattern matching. J.
Discrete Algorithms 9(1), 82–99 (2011)
18. Dondi, R., Fertin, G., Vialette, S.: Finding approximate and constrained motifs in graphs. In: Proceed-
ings of the 22nd Annual Symposium Combinatorial Pattern Matching, pp. 388–401, (2011)
19. Downey, P.J., Sethi, R., Tarjan, R.E.: Variations on the common subexpression problem. J. ACM 27,
758–771 (1980)
20. Downey, R.G., Fellows, M.R.: Parameterized Complexity. Springer, Berlin (1999)
21. Fellows, M.R., Fertin, G., Hermelin, D., Vialette, S.: Upper and lower bounds for finding connected
motifs in vertex-colored graphs. J. Comput. Syst. Sci. 77(4), 799–811 (2011)
22. Gawrychowski, P.: Faster algorithm for computing the edit distance between SLP-compressed strings.
In: Proceedings of the Symposium on String Processing and Information Retrieval, pp. 229–236, (2012)
23. Giaquinta, E., Grabowski, S.: New algorithms for binary jumbled pattern matching. Inf. Process. Lett.
113(14–16), 538–542 (2013)
24. Jacobson, G.: Space-efficient static trees and graphs. In: Proceedings of the 30th Annual Symposium
on Foundations of Computer Science (FOCS), pp. 549–554, (1989)
25. Kociumaka, T., Radoszewski, J., Rytter, W.: Efficient indexes for jumbled pattern matching with
constant-sized alphabet. In: ESA, pp. 625–636, (2013)
26. Lacroix, V., Fernandes, C.G., Sagot, M.-F.: Motif search in graphs: application to metabolic networks.
IEEE/ACM Trans. Comput. Biol. Bioinform. 3(4), 360–368 (2006)
27. Moosa, T.M., Rahman, M.S.: Indexing permutations for binary strings. Inf. Process. Lett. 110(18–19),
795–798 (2010)
28. Moosa, T.M., Rahman, M.S.: Sub-quadratic time and linear space data structures for permutation
matching in binary strings. J. Discrete Algorithms 10, 5–9 (2012)
29. Munro, I.: Tables. In: Proceedings of the 16th Foundations of Software Technology and Theoretical
Computer Science (FSTTCS), pp. 37–42, (1996)
30. Rytter, W.: Application of Lempel–Ziv factorization to the approximation of grammar-based compres-
sion. Theoret. Comput. Sci. 302(1–3), 211–222 (2003)
123

Binary Jumbled Pattern Matching On Trees and Tree-Like Structures

Uploaded by

Copyright:

Available Formats

Binary Jumbled Pattern Matching On Trees and Tree-Like Structures

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Binary Jumbled Pattern Matching On Trees and Tree-Like Structures

Uploaded by

Copyright:

Available Formats

Algorithmica (2015) 73:571–588

Binary Jumbled Pattern Matching on Trees

Travis Gagie · Danny Hermelin ·

Abstract Binary jumbled pattern matching asks to preprocess a binary string S in

G. M. Landau · O. Weimann (B)

Keywords Pattern matching · Tree pattern matching · Permutation pattern

Jumbled pattern matching is an important variant of classical pattern matching with

jumbled pattern matching is fixed-parameter tractable when parameterized only by

2 Jumbled Pattern Matching on Trees

2.1 A Simple Index

Next, in a bottom-up fashion, we compute for each node v of T an array Av of size

Lemma 2 The above algorithm runs in O(n 2 ) time.

2.2 Pattern Matching

2.3 An Improved Index

2.3.1 From Trees to Strings

2.3.2 First Speedup

2.3.3 Second Speedup

In strings, an additional logarithmic improvement shown in [28] can be obtained as

2.3.4 A Micro–Macro Decomposition

A micro–macro decomposition [1] is a partition of T into O(n/ log n) disjoint con-

2.3.5 A Bottom up Traversal of the Macro Tree

Fig. 1 A micro tree C and its t(C)

2.4 Finding the Query Pattern

2 A centroid decomposition can be found in linear time.

3 Jumbled Pattern Matching on Grammars

In grammar-based compression, a binary string S of length n is compressed using a

Theorem 2 Given a binary string S of length n compressed by a context free grammar

4 Jumbled Pattern Matching on Bounded Treewidth Graphs

In this section we consider the extension of binary jumbled pattern matching to

4.1 Tree Decompositions

We begin by first introducing some necessary notation and terminology. Let G =

4.2 Positive Partitions

A X from AY in this case amounts to converting each positive partition ΠY of Y to a

4.3 Introduce Nodes

4.4 Join Nodes

5 Conclusions and Open Problems

Acknowledgments We thank the anonymous reviewers for their helpful comments.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.