Lifted First-Order Belief Propagation: Parag Singla Pedro Domingos

Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008)
Lifted First-Order Belief Propagation
Parag Singla Pedro Domingos

Department of Computer Science and Engineering
University of Washington
Seattle, WA 98195-2350, U.S.A.
{parag, pedrod}@cs.washington.edu
Abstract (2005; 2006). (Limited lifted aspects are present in some

earlier systems, like Pfeffer et al.’s (1999) SPOOK.) Poole
Unifying first-order logic and probability is a long-standing
goal of AI, and in recent years many representations com- and Braz et al. introduced a lifted version of variable elim-
bining aspects of the two have been proposed. However, in- ination, the simplest algorithm for inference in probabilistic
ference in them is generally still at the level of propositional graphical models. Unfortunately, variable elimination has
logic, creating all ground atoms and formulas and applying exponential cost in the treewidth of the graph, making it in-
standard probabilistic inference methods to the resulting net- feasible for most real-world applications. Scalable approx-
work. Ideally, inference should be lifted as in first-order imate algorithms for probabilistic inference fall into three
logic, handling whole sets of indistinguishable objects to- main classes: loopy belief propagation (BP), Monte Carlo
gether, in time independent of their cardinality. Poole (2003) methods, and variational methods. In this paper we develop
and Braz et al. (2005, 2006) developed a lifted version of a lifted version of BP, building on the work of Jaimovich et
the variable elimination algorithm, but it is extremely com-
al. (2007).
plex, generally does not scale to realistic domains, and has
only been applied to very small artificial problems. In this Jaimovich et al. pointed out that, if there is no evidence,
paper we propose the first lifted version of a scalable proba- BP in probabilistic logical models can be trivially lifted, be-
bilistic inference algorithm, belief propagation (loopy or not). cause all groundings of the same atoms and clauses become
Our approach is based on first constructing a lifted network, indistinguishable. Our approach proceeds by identifying the
where each node represents a set of ground atoms that all subsets of atoms and clauses that remain indistinguishable
pass the same messages during belief propagation. We then even after evidence is taken into account. We then form a
run belief propagation on this network. We prove the correct- network with supernodes and superfeatures corresponding
ness and optimality of our algorithm. Experiments show that to these sets, and apply BP to it. This network can be vastly
it can greatly reduce the cost of inference. smaller than the full ground network, with the correspond-
ing efficiency gains. We show that there is a unique minimal
Introduction lifted network for every inference problem, and that our al-
Representations used in AI fall into two broad categories: gorithm returns it.
logical and probabilistic. Their strengths and weaknesses Our method is applicable to essentially any probabilistic
are complementary: first-order logic is best for handling logical language, including approaches based on Bayesian
complex relational structure, and probabilistic graphical networks and Markov networks. We will use Markov logic
models for handling uncertainty. AI problems generally as a concrete example (Richardson & Domingos 2006). Our
contain both, and there have been many proposals to unify algorithm is also much simpler than the algorithms of Poole
the two languages, most recently in the emerging field of sta- and Braz et al. We present the first experimental results for
tistical relational learning (Getoor & Taskar 2007). Unfortu- lifted probabilistic inference on real data. These, and sys-
nately, at inference time these approaches typically become tematic experiments on synthetic problems, show that lifted
purely probabilistic, in the sense that they propositionalize BP can greatly outperform the standard propositionalized
all atoms and clauses and apply standard probabilistic infer- version.
ence algorithms. A key property of first-order logic is that it
allows lifted inference, where queries are answered without Background
materializing all the objects in the domain (e.g., resolution Belief Propagation
(Robinson 1965)). Lifted inference is potentially much more Graphical models compactly represent the joint distribution
efficient than propositionalized inference, and extending it of a set of variables X = (X1 , X2 , . . . , Xn ) Q∈ X as a
to probabilistic logical languages is a desirable goal. product of factors (Pearl 1988): P (X = x) = Z1 k fk (xk ),
The only approach to lifted probabilistic inference to date where each factor fk is a non-negative function of a sub-
was developed by Poole (2003) and extended by Braz et al. set of the variables xk , and Z is a normalization constant.
Copyright c 2008, Association for the Advancement of Artificial Under appropriate restrictions, the model is a Bayesian net-
Intelligence (www.aaai.org). All rights reserved. work and Z = 1. A Markov network or Markov random
1094
Table 1: Example of a Markov logic network. Free variables are implicitly universally quantified.
English First-Order Logic Weight
Most people don’t smoke. ¬Smokes(x) 1.4
Most people don’t have cancer. ¬Cancer(x) 2.3
Most people aren’t friends. ¬Friends(x, y) 4.6
Smoking causes cancer. Smokes(x) ⇒ Cancer(x) 1.5
Friends have similar smoking habits. Smokes(x) ∧ Friends(x, y) ⇒ Smokes(y) 1.1
field can have arbitrary factors. As long as P (X = x) > 0 Belief propagation can also be used for exact inference in
for all x, the distribution can be equivalently
P represented as a arbitrary graphs, by combining nodes until a tree is obtained,
log-linear model: P (X = x) = Z1 exp ( i wi gi (x)), where but this suffers from the same combinatorial explosion as
the features gi (x) are arbitrary functions of (a subset of) the variable elimination.
state.
Graphical models can be represented as factor graphs
Markov Logic
(Kschischang, Frey, & Loeliger 2001). A factor graph is First-order probabilistic languages combine graphical mod-
a bipartite graph with a node for each variable and factor in els with elements of first-order logic, by defining template
the model. (For convenience, we will consider one factor features that apply to whole classes of objects at once.
fi (x) = exp(wi gi (x)) per feature gi (x), i.e., we will not A simple and powerful such language is Markov logic
aggregate features over the same variables into a single fac- (Richardson & Domingos 2006). A Markov logic network
tor.) Variables and the factors they appear in are connected (MLN) is a set of weighted first-order clauses.1 Together
by undirected edges. with a set of constants representing objects in the domain
The main inference task in graphical models is to compute of interest, it defines a Markov network with one node per
the conditional probability of some variables (the query) ground atom and one feature per ground clause. The weight
given the values of some others (the evidence), by summing of a feature is the weight of the first-order clause that orig-
out the remaining variables. This problem is #P-complete, inated it. The probability of P a state x in such aQnetwork
but becomes tractable if the graph is a tree. In this case, the is given by P (x) = Z1 exp ( i wi gi (x)) = Z1 i fi (x),
marginal probabilities of the query variables can be com- where wi is the weight of the ith clause, gi = 1 if the ith
puted in polynomial time by belief propagation, which con- clause is true, and gi = 0 otherwise. Table 1 shows an ex-
sists of passing messages from variable nodes to the corre- ample of a simple MLN representing a standard social net-
sponding factor nodes and vice-versa. The message from a work model. In a domain with two objects Anna and Bob,
variable x to a factor f is ground atoms will include Smokes(Anna), Cancer(Bob),
Y Friends(Anna, Bob), etc. States of the world where more
µx→f (x) = µh→x (x) (1) smokers have cancer, and more pairs of friends have similar
h∈nb(x)\{f } smoking habits, are more probable.
where nb(x) is the set of factors x appears in. The message Inference in Markov logic can be carried out by creat-
from a factor to a variable is ing the ground network and applying belief propagation to
  it, but this can be extremely inefficient because the size of
X Y the ground network is O(dc ), where d is the number of ob-
µf →x (x) = f (x) µy→f (y) (2) jects in the domain and c is the highest clause arity. In the
∼{x} y∈nb(f )\{x} next section we introduce a better, lifted algorithm for in-
ference. Although we focus on Markov logic for simplicity,
where nb(f ) are the arguments of f , and the sum is over the algorithm is easily generalized to other representations.
all of these except x. The messages from leaf variables are Alternatively, they can be translated to Markov logic and the
initialized to 1, and a pass from the leaves to the root and algorithm applied directly (Richardson & Domingos 2006).
back to the leaves suffices. The (unnormalized) marginal of
Lifted Belief Propagation
Q
each variable x is then given by h∈nb(x) µh→x (x). Evi-
dence is incorporated by setting f (x) = 0 for states x that We begin with some necessary definitions. These assume
are incompatible with it. This algorithm can still be applied the existence of an MLN M, set of constants C, and ev-
when the graph has loops, repeating the message-passing idence database E (set of ground literals). For simplicity,
until convergence. Although this loopy belief propagation our definitions and explanation of the algorithm will assume
has no guarantees of convergence or of giving the correct re- that each predicate appears at most once in any given MLN
sult, in practice it often does, and can be much more efficient clause. We will then describe how to handle multiple occur-
than other methods. Different schedules may be used for rences of a predicate in a clause.
message-passing. Here we assume flooding, the most widely Definition 1 A supernode is a set of groundings of a predi-
used and generally best-performing method, in which mes- cate that all send and receive the same messages at each step
sages are passed from each variable to each corresponding
factor and back at each step (after initializing all variable 1
In this paper we assume function-free clauses and Herbrand
messages to 1). interpretations.
1095
of belief propagation, given M, C and E. The supernodes
of a predicate form a partition of its groundings. Table 2: Lifted network construction.
A superfeature is a set of groundings of a clause that all
send and receive the same messages at each step of belief function LNC(M, C, E)
propagation, given M, C and E. The superfeatures of a inputs: M, a Markov logic network
clause form a partition of its groundings. C, a set of constants
Definition 2 A lifted network is a factor graph composed of E, a set of ground literals
supernodes and superfeatures. The factor corresponding to output: L, a lifted network
a superfeature g(x) is exp(wg(x)), where w is the weight for each predicate P
of the corresponding first-order clause. A supernode and a for each truth value t in {true, false, unknown}
superfeature have an edge between them iff some ground form a supernode containing all groundings of P
atom in the supernode appears in some ground clause in the with truth value t
superfeature. Each edge has a positive integer weight. A repeat
minimal lifted network is a lifted network with the smallest for each clause C involving predicates P1 , . . . , Pk
possible number of supernodes and superfeatures. for each tuple of supernodes (N1 , . . . , Nk ),
where Ni is a Pi supernode
The first step of lifted BP is to construct the minimal lifted form a superfeature F by joining N1 , . . . , Nk
network. The size of this network is O(nm), where n is the for each predicate P
number of supernodes and m the number of superfeatures. for each superfeature F it appears in
In the best case, the lifted network has the same size as the S(P, F ) ← projection of the tuples in F down to
MLN; in the worst case, as the ground Markov network. the variables in P
The second and final step in lifted BP is to apply standard for each tuple s in S(P, F )
BP to the lifted network, with two changes: T (s, F ) ← number of F ’s tuples that were
1. The message from supernode x to superfeature f becomes projected into s
n(f,x)−1 Q
S
µf →x h∈nb(x)\{f } µh→x (x)
n(h,x)
, where n(h, x) is S(P ) ← F S(P, F )
the weight of the edge between h and x. form a new supernode from each set of tuples in S(P )
with the same T (s, F ) counts for all F
2. The (unnormalized) marginal of each supernode (and until convergence
therefore of each ground atom in it) is given by add all current supernodes and superfeatures to L
Q n(h,x)
h∈nb(x) µh→x (x). for each supernode N and superfeature F in L
The weight of an edge is the number of identical messages add to L an edge between N and F with weight T (s, F )
that would be sent from the ground clauses in the superfea- return L
ture to each ground atom in the supernode if BP was carried
out on the ground network. The n(f, x) − 1 exponent re- the arguments it shares with Ri . Lifted network construction
flects the fact that a variable’s message to a factor excludes thus proceeds by alternating between two steps:
the factor’s message to the variable.
The lifted network is constructed by (essentially) simulat- 1. Form superfeatures by doing joins of their supernodes.
ing BP and keeping track of which ground atoms and clauses 2. Form supernodes by projecting superfeatures down to
send the same messages. Initially, the groundings of each their predicates, and merging atoms with the same pro-
predicate fall into three groups: known true, known false jection counts.
and unknown. (One or two of these may be empty.) Each Pseudo-code for the algorithm is shown in Table 2. The
such group constitutes an initial supernode. All groundings projection counts at convergence are the weights associated
of a clause whose atoms have the same combination of truth with the corresponding edges.
values (true, false or unknown) now send the same messages To handle clauses with multiple occurrences of a predi-
to the ground atoms in them. In turn, all ground atoms that cate, we keep a tuple of edge weights, one for each occur-
receive the same number of messages from the superfeatures rence of the predicate in the clause. A message is passed
they appear in send the same messages, and constitute a new for each occurrence of the predicate, with the corresponding
supernode. As the effect of the evidence propagates through edge weight. Similarly, when projecting superfeatures into
the network, finer and finer supernodes and superfeatures are supernodes, a separate count is maintained for each occur-
created. rence, and only tuples with the same counts for all occur-
If a clause involves predicates R1 , . . . , Rk , and N = rences are merged.
(N1 , . . . , Nk ) is a corresponding tuple of supernodes, the
groundings of the clause generated by N are found by join- Theorem 1 Given an MLN M, set of constants C and set
ing N1 , . . . , Nk (i.e., by forming the Cartesian product of of ground literals E, there exists a unique minimal lifted net-
the relations N1 , . . . , Nk , and selecting the tuples in which work L∗ , and algorithm LNC(M, C, E) returns it. Belief
the corresponding arguments agree with each other, and with propagation applied to L∗ produces the same results as be-
any corresponding constants in the first-order clause). Con- lief propagation applied to the ground Markov network gen-
versely, the groundings of predicate Ri connected to ele- erated by M and C.
ments of a superfeature F are obtained by projecting F onto Proof. We prove each part in turn.
1096
The uniqueness of L∗ is proved by contradiction. Sup- The proof that BP applied to L gives the same results as
pose there are two minimal lifted networks L1 and L2 . Then BP applied to the ground network follows from Definitions 1
there exists a ground atom a that is in supernode N1 in L1 and 2, the previous parts of the theorem, modifications 1 and
and in supernode N2 in L2 , and N1 6= N2 ; or similarly for 2 to the BP algorithm, and the fact that the number of iden-
some superfeature c. Then, by Definition 1, all nodes in N1 tical messages sent from the ground atoms in a superfeature
send the same messages as a and so do all nodes in N2 , and to each ground atom in a supernode is the cardinality of the
therefore N1 = N2 , resulting in a contradiction. A similar projection of the superfeature onto the supernode. 2
argument applies to c. Therefore there is a unique minimal
lifted network L∗ . Clauses involving evidence atoms can be simplified (false
We now show that LNC returns L∗ in two subparts: literals and clauses containing true literals can be deleted).
As a result, duplicate clauses may appear, and the corre-
1. The network Li obtained by LNC at any iteration i is no sponding superfeatures can be merged. This will typically
finer than L∗ in the sense that, if two ground atoms are result in duplicate instances of tuples. Each
in different supernodes in Li , they are in different supern- P tuple in the
merged superfeature is assigned a weight i mi wi , where
odes in L∗ , and similarly for ground clauses. mi is the number of duplicate tuples resulting from the ith
2. LNC converges in a finite number of iterations to a net- superfeature and wi is the corresponding weight. During
work L where all ground atoms (ground clauses) in a su- the creation of supernodes, T (s, F ) is now the number of
pernode (superfeature) receive the same messages during F tuples projecting into s multiplied by the corresponding
ground BP. weight. This can greatly reduce the size of the lifted net-
The claim follows immediately from these two statements, work. When no evidence is present, our algorithm reduces
since if L is no finer than L∗ and no coarser, it must be L∗ . to the one proposed by Jaimovich et al. (2007).
For subpart 1, it is easy to see that if it is satisfied by An important question remains: how to represent supern-
the atoms at the ith iteration, then it is also satisfied by odes and superfeatures. Although this does not affect the
the clauses at the ith iteration. Now, we will prove sub- space or time cost of belief propagation (where each supern-
part 1 by induction. Clearly, it is true at the start of the ode and superfeature is represented by a single symbol), it
first iteration. Suppose that a supernode N splits into N1 can greatly affect the cost of constructing the lifted network.
and N2 at the ith iteration. Let a1 ∈ N1 and a2 ∈ N2 . The simplest option is to represent each supernode or su-
Then there must be a superfeature F in the ith iteration such perfeature extensionally as a set of tuples (i.e., a relation), in
that T (a1 , F ) 6= T (a2 , F ). Since Li is no finer than ∗ which case joins and projections reduce to standard database
∗
S L , operations. However, in this case the cost of constructing
there exist superfeatures Fj in L such that F = j Fj .
the lifted network is similar to the cost of constructing the
Since T (a1 , F ) 6= T (a2 , F ), ∃j T (a1 , Fj ) 6= T (a2 , Fj ), full ground network, and can easily become the bottleneck.
and therefore a1 and a2 are in different supernodes in L∗ . A better option is to use a more compact intensional repre-
Hence Li+1 is no finer than L∗ , and by induction this is true sentation, as done by Poole (2003) and Braz et al. (2005;
at every iteration. 2006).2
We prove subpart 2 as follows. In the first iteration each
A ground atom can be viewed as a first-order atom with
supernode either remains unchanged or splits into finer su-
all variables constrained to be equal to constants, and sim-
pernodes, because each initial supernode is as large as pos-
ilarly for ground clauses. (For example, R(A, B) is R(x, y)
sible. In any iteration, if each supernode remains unchanged
with x = A and y = B.) We represent supernodes by sets of
or splits into finer supernodes, each superfeature also re-
(α, γ) pairs, where α is a first-order atom and γ is a set of
mains unchanged or splits into finer superfeatures, because
constraints, and similarly for superfeatures. Constraints are
splitting a supernode that is joined into a superfeature nec-
of the form x = y or x 6= y, where x is an argument of the
essarily causes the superfeature to be split as well. Simi-
atom and y is either a constant or another argument. For ex-
larly, if each superfeature remains unchanged or splits into
ample, (S(v, w, x, y, z), {w = x, y = A, z 6= B, z 6= C}) com-
finer superfeatures, each supernode also remains unchanged
pactly represents all groundings of S(v, w, x, y, z) compati-
or splits into finer supernodes, because (a) if two nodes are
ble with the constraints. Notice that variables may be left
in different supernodes they must have different counts from
unconstrained, and that infinite sets of atoms can be finitely
at least one superfeature, and (b) if two nodes have different
represented in this way.
counts from a superfeature, they must have different counts
Let the default value of a predicate R be its most frequent
from at least one of the finer superfeatures that it splits into,
value given the evidence (true, false or unknown). Let SR,i
and therefore must be assigned to different supernodes.
be the set of constants that appear as the ith argument of R
Therefore, throughout the algorithm supernodes and su-
only in groundings with the default value. Supernodes not
perfeatures can only remain unchanged or split into finer
involving any members of SR,i for any argument i are repre-
ones. Because there is a maximum possible number of su-
sented extensionally (i.e. with pairs (α, γ) where γ contains
pernodes and superfeatures, this also implies that the algo-
rithm converges in a finite number of iterations. Further, no 2
Superfeatures are related, but not identical, to the parfactors of
splits occur iff all atoms in each supernode have the same Poole and Braz et al.. One important difference is that superfea-
counts as in the previous iteration, which implies they re- tures correspond to factors in the original graph, while parfactors
ceive the same messages at every iteration, and so do all correspond to factors created during variable elimination. Super-
clauses in each corresponding superfeature. features are thus exponentially more compact.
1097
a constraint of the form x = A, where A is a constant, for Link Prediction
each argument x). Initially, supernodes involving members Link prediction is an important problem with many ap-
of SR,i are represented using (α, γ) pairs containing con- plications: social network analysis, law enforcement,
straints of the form x 6= A for each A ∈ C \ SR,i .3 When bibliometrics, identifying metabolic networks in cells,
two or more supernodes are joined to form a superfeature F , etc. We experimented on the link prediction task of
if the kth argument of T F ’s clause is the i(j)th argument of Richardson and Domingos (2006), using the UW-CSE
its jth literal, Sk = j Sr(j),i , where r(j) is the predicate database and MLN publicly available from the Alchemy
symbol in the jth literal. F is now represented analogously website (Kok et al. 2007). The database contains a total
to the supernodes, according to whether or not it involves el- of 2678 groundings of predicates like: Student(person),
ements of Sk . If F is represented intensionally, each (α, γ) Professor(person), AdvisedBy(person1, person2),
pair is divided into one pair for each possible combination TaughtBy(course, person, quarter), Publication
of equality/inequality constraints among the clause’s argu- (paper, person) etc. The MLN includes 94 formulas
ments, which are added to γ. When forming a supernode stating regularities like: each student has at most one advi-
from superfeatures, the constraints in each (α, γ) pair in the sor; if a student is an author of a paper, so is her advisor;
supernode are the union of (a) the corresponding constraints etc. The task is to predict who is whose advisor, i.e., the
in the superfeatures on the variables included in the supern- AdvisedBy(x, y) predicate, from information about paper
ode, and (b) the constraints induced by the excluded vari- authorships, classes taught, etc. The database is divided into
ables on the included ones. This process is analogous to the five areas (AI, graphics, etc.); we trained weights on the
shattering process of Braz et al. (2005). smallest using Alchemy’s default discriminative learning
In general, finding the most compact representation for algorithm, ran inference on all five, and averaged the results.
supernodes and superfeatures is an intractable problem. In-
Social Networks
vestigating it further is a direction for future work.
We also experimented with the example “Friends & Smok-
Experiments ers” MLN in Table 1. The goal here was to examine how
We compared the performance of lifted BP with the ground the relative performance of lifted BP and ground BP varies
version on three domains. All the domains are loopy (i.e., with the number of objects in the domain and the fraction of
the graphs have cycles), and the algorithms of Poole (2003) objects we have evidence about. We varied the number of
and Braz et al. (2005; 2006) run out of memory, rendering people from 250 to 2500 in increments of 250, and the frac-
them inapplicable. We implemented lifted BP as an exten- tion of known people KF from 0 to 1. A KF of r means
sion of the open-source Alchemy system (Kok et al. 2007). that we know for a randomly chosen r fraction of all peo-
Since our algorithm is guaranteed to produce the same re- ple (a) whether they smoke or not and (b) who 10 of their
sults as the ground version, we do not report solution quality. friends are (other friendship relations are still assumed to be
Diagnosing the convergence of BP is a difficult problem; we unknown). Cancer(x) is unknown for all x. The people
ran it for 1000 steps for both algorithms in all experiments. with known information were randomly chosen. The whole
BP did not always converge. Either way, it was marginally domain was divided into a set of friendship clusters of size
less accurate than Gibbs sampling. The experiments were 50 each. For each known person, we randomly chose each
run on a cluster of nodes, each node having 3.46 GB of RAM friend with equal probability of being inside or outside their
and two processors running at 3 GHz. friendship cluster. All unknown atoms were queried.
Entity Resolution Results
Entity resolution is the problem of determining which ob- Results on all domains are summarized in Table 3. The
servations (e.g., records in a database) correspond to the Friends & Smokers results are for 1000 people and KF =
same objects. This problem is of crucial importance to many 0.1; the Cora results are for 500 records. All results for
large scientific projects, businesses, and government agen- Cora and Friends & Smokers are averages over five random
cies, and has received increasing attention in the AI com- splits. 4 LNC with intensional representation is comparable
munity in recent years. We used the version of McCallum’s in time and memory with the extensional version on Cora
Cora database available on the Alchemy website (Kok et al. and UW-CSE, but much more efficient on Friends & Smok-
2007). The inference task was to de-duplicate citations, au- ers. All the results shown are for the intensional representa-
thors and venues (i.e., to determine which pairs of citations tion. LNC is slower than grounding the full network, but BP
refer to the same underlying paper, and similarly for author is much faster on the lifted network, resulting in better times
fields and venue fields). We used the MLN (formulas and in all domains (by two orders of magnitude on Friends &
weights) used by Singla and Domingos (2005) in their ex- Smokers). The number of (super) features created is much
periments. This contains 46 first-order clauses stating reg- smaller for lifted BP than for ground BP (by four orders of
ularities such as: if two fields have high TF-IDF similarity, magnitude on Friends & Smokers). Memory (not reported
they are (probably) the same; if two records are the same, here) is comparable on Cora and UW-CSE, and much lower
their fields are the same, and vice-versa; etc. for LNC on Friends & Smokers. Figure 1 shows how net-
3
work size varies with the number of people in the Friends
In practice, variables are typed, and C is replaced by the do-
4
main of the argument; and the set of constraints is only stored once, For Cora, we made sure that each actual cluster was either
and pointed to as needed. completely inside or outside each split.
1098
Table 3: Time and memory cost of ground and lifted BP.
Domain Time (in seconds) No. of (Super) Features
Construction BP Total
Ground Lifted Ground Lifted Ground Lifted Ground Lifted
Cora 263.1 1173.3 12368.4 3997.7 12631.6 5171.1 2078629 295468
UW-CSE 6.9 22.1 1015.8 602.5 1022.8 624.7 217665 86459
Friends & Smokers 38.8 89.7 10702.2 4.4 10741.0 94.2 1900905 58
1e+07 Acknowledgments
This research was funded by DARPA contracts NBCH-
1e+06 D030010/02-000225, FA8750-07-D-0185, and HR0011-07-C-
No. of (Super) Features
0060, DARPA grant FA8750-05-2-0283, NSF grant IIS-0534881,

100000 and ONR grant N-00014-05-1-0313. The views and conclusions
contained in this document are those of the authors and should not
10000 be interpreted as necessarily representing the official policies, ei-
ther expressed or implied, of DARPA, NSF, ONR, or the United
1000
States Government.
100 References
de S. Braz, R.; Amir, E.; and Roth, D. 2005. Lifted first-order
10 Ground probabilistic inference. In Proc. IJCAI-05, 1319–1324.
Lifted de S. Braz, R.; Amir, E.; and Roth, D. 2006. MPE and par-
1 tial inversion in lifted probabilistic variable elimination. In Proc.
1000 0
1500 500
2000 2500 AAAI-06, 1123–1130.
No. of Objects Getoor, L., and Taskar, B., eds. 2007. Introduction to Statistical
Figure 1: Growth of network size on Friends & Smokers Relational Learning. MIT Press.
domain. Jaimovich, A.; Meshi, O.; and Friedman, N. 2007. Template
based inference in symmetric relational Markov random fields.
& Smokers domain, for KF = 0.1. The lifted network is In Proc. UAI-07, 191–199.
always much smaller, and the difference increases markedly Kok, S.; Sumner, M.; Richardson, M.; Singla, P.; Poon, H.; Lowd,
with the number of objects (note the logarithmic scale on D; and Domingos, P. 2007. The Alchemy system for statistical
the Y axis). The ground version ran out of memory for more relational AI. Tech. Rept., Dept. Comp. Sci.& Eng., Univ. Wash-
than 1500 people. We also varied KF while keeping the ington, Seattle, WA. http://alchemy.cs.washington.edu.
number of people constant at 1000 (results not shown due to Kschischang, F. R.; Frey, B. J.; and Loeliger, H.-A. 2001. Factor
lack of space). The lifted network is always smaller than the graphs and the sum-product algorithm. IEEE Transactions on
ground one by at least four orders of magnitude, rising to six Information Theory 47:498–519.
for extreme values of KF . Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems:
Networks of Plausible Inference. Morgan Kaufmann.
Conclusion and Future Work Pfeffer, A.; Koller, D.; Milch, B.; and Takusagawa, K. T. 1999.
Spook: A system for probabilistic object-oriented knowledge rep-
resentation. In Proc. UAI-99, 541–550.
We presented the first scalable algorithm for lifted proba-
bilistic inference, and the first application of these to real- Poole, D. 2003. First-order probabilistic inference. In Proc.
IJCAI-03, 985–991.
world domains. Our algorithm constructs a network of su-
pernodes and superfeatures, corresponding to sets of nodes Richardson, M., and Domingos, P. 2006. Markov logic networks.
Machine Learning 62:107–136.
and features that are indistiguishable given the evidence, and
applies belief propagation to this network. Our experiments Robinson, J. A. 1965. A machine-oriented logic based on the
resolution principle. Journal of the ACM 12:23–41.
illustrate the efficiency gains obtainable by this method.
Singla, P., and Domingos, P. 2005. Discriminative training of
Directions for future work include: clustering atoms to Markov logic networks. In Proc. AAAI-05, 868–873.
further compress the representation of supernodes; merg- Singla, P., and Domingos, P. 2006. Memory-efficient inference
ing nodes that pass approximately the same messages; gen- in relational domains. In Proc. AAAI-06, 488–493.
eralizing our approach to other inference methods (e.g., Singla, P., and Domingos, P. 2007. Markov logic in infinite do-
MCMC) and tasks (e.g., MPE); fully unifying lifted BP and mains. In Proc. UAI-07, 368–375.
resolution; applying lifted BP to infinite domains (Singla & Wellman, M.; Breese, J. S.; and Goldman, R. P. 1992. From
Domingos 2007); extending lifted BP to subsume lazy in- knowledge bases to decision models. Knowledge Engineering
ference (Singla & Domingos 2006) and knowledge-based Review 7.
model construction (Wellman et al. 1992); using lifted BP
in learning; applying it to other domains; etc.
1099

Lifted First-Order Belief Propagation: Parag Singla Pedro Domingos

Uploaded by

Copyright:

Available Formats

Lifted First-Order Belief Propagation: Parag Singla Pedro Domingos

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lifted First-Order Belief Propagation: Parag Singla Pedro Domingos

Uploaded by

Copyright:

Available Formats

Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008)

Lifted First-Order Belief Propagation

Parag Singla Pedro Domingos

Abstract (2005; 2006). (Limited lifted aspects are present in some

0060, DARPA grant FA8750-05-2-0283, NSF grant IIS-0534881,

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.