Building A Knowledge Graph From Natural Language Definitions For Interpretable Text Entailment Recognition
Building A Knowledge Graph From Natural Language Definitions For Interpretable Text Entailment Recognition
Building A Knowledge Graph From Natural Language Definitions For Interpretable Text Entailment Recognition
Abstract
Natural language definitions of terms can serve as a rich source of knowledge, but structuring them into a comprehensible semantic
model is essential to enable them to be used in semantic interpretation tasks. We propose a method and provide a set of tools for
automatically building a graph world knowledge base from natural language definitions. Adopting a conceptual model composed of a
set of semantic roles for dictionary definitions, we trained a classifier for automatically labeling definitions, preparing the data to be
later converted to a graph representation. WordNetGraph, a knowledge graph built out of noun and verb WordNet definitions according
to this methodology, was successfully used in an interpretable text entailment recognition approach which uses paths in this graph to
provide clear justifications for entailment decisions.
3438
(OIE). Although they generate a syntactic-semantic graph Role Description
representation of the definitions, the resulting graphs are Supertype the immediate or ancestral en-
used only as an intermediary resource for the final goal of tity’s superclass
extracting semantic relations between the entities present in Differentia quality a quality that distinguishes the
the definition. entity from the others under the
same supertype
3. Graph Conceptual Model Differentia event an event (action, state or pro-
cess) in which the entity par-
To build the definition graph, we adopted the conceptual ticipates and that is mandatory
model proposed by Silva et al. (2016). This model ex- to distinguish it from the others
tends the genus-differentia definition pattern from Aristo- under the same supertype
tle’s classic theory of definition (Berg, 1982; Lloyd, 1962; Event location the location of a differentia
Granger, 1984) by defining a set of entity-centered semantic event
roles for lexical definitions. Differently from the commonly Event time the time in which a differentia
used event-centered semantic roles, which define the se- event happens
mantic relations holding among a predicate (the main verb Origin location the entity’s location of origin
in a clause) and its associated participants and properties Quality modifier degree, frequency or manner
(Màrquez et al., 2008), definition’s semantic roles express modifiers that constrain a dif-
the part played by an expression in a definition, showing ferentia quality
how it relates to the definiendum, that is, the entity being Purpose the main goal of the entity’s ex-
defined. istence or occurrence
In this model, the genus concept was replaced by the more Associated fact a fact whose occurrence is/was
general role supertype, which can be not only the definien- linked to the entity’s existence
dum’s immediate superclass but also an ancestor higher or occurrence
in the concepts hierarchy. The differentia component was Accessory determiner a determiner expression
split into two roles: differentia quality and differentia event. that doesn’t constrain the
These three roles can be seen as the representatives of an supertype-differentia scope
entity’s essential properties, while other roles, such as as- Accessory quality a quality that is not essential to
sociated fact, purpose or accessory quality, for example, characterize the entity
define non-essential properties. The conceptual model is [Role] particle a particle, such as a phrasal
depicted in Figure 1, and Table 1 presents a summarized verb complement, non-
description for each of the roles defined in this model. contiguous to the other role
This set of semantic roles captures the semantic “shape” components
of natural language definitions and allows the extraction
of structured representations from linguistic resources, en- Table 1: Semantic roles for dictionary definitions
abling them to be used as knowledge sources in a wide
range of semantic tasks.
verb glosses, we adopted the following methodology for
4. Construction Methodology classifying and structuring the definitions:
Synsets sample selection: in order to use a supervised ma-
Structuring natural language definitions as a graph allows chine learning model to classify the data, we needed a ini-
us to select the portions of information regarding an entity’s tial set of annotated definitions. To build this set, we ran-
description that are relevant for a certain reasoning task. domly selected 2,000 WordNet synsets, being 1,732 noun
For example, consider the definition (from WordNet) for synsets and 268 verb synsets (the verb database size is
the concept “lake poets”, which was classified according around 17% of the noun database size).
to the model described in Section 3., illustrated in Figure Automatic pre-annotation: the set of 2,000 definitions
2. When retrieving data related to this concept, we could was automatically pre-annotated according to a rule-based
be interested only in origin- (lake poets are English poets), heuristic that takes into account the syntactic patterns iden-
time- (lake poets are poets at the beginning of the 19th cen- tified by statistical analysis as described by Silva et al.
tury) or space- (lake poets are poets who lived in the Lake (2016). Using the Stanford parser (Manning et al., 2014),
District) related information. When each of those roles is we generated the syntactic parse tree for each definition,
represented as a node in a graph we can focus only on the identified the relevant phrasal nodes and then assigned the
path containing the nodes of interest. Moreover, since the semantic roles more often associated to them. For exam-
definition is split into segments rather than single words, ple: the supertype for a noun definition is usually the inner-
each node contains a comprehensible amount of informa- most and leftmost noun phrase (NP) that contains at least
tion, avoiding the need to visit several nodes to gather in- one noun (NN); a differentia event is usually either a sub-
telligible phrases. ordinate clause (SBAR) or a verb phrase (VP); an event
To generate the WordNetGraph1 – a knowledge graph fol- location is normally a prepositional phrase (PP) inside a
lowing the RDF data model – from WordNet’s noun and SBAR or VP and possibly containing a location named en-
tity, and so on. Figure 3 shows the parse tree generated for
1
https://github.com/Lambda-3/WordnetGraph the definition of the term “Scotch” – whiskey distilled in
3439
Figure 1: Conceptual model for the semantic roles for lexical definitions. Relationships between [role] particle and every
other role in the model are expressed as dashed lines for readability.
Figure 2: Example of role labeling for the definition of the “lake poets” synset.
Scotland – and the semantic roles automatically assigned Classifier training: the curated data was then used to
to each phrasal node. train a Recurrent Neural Network (RNN) machine learning
model designed for sequence labeling. We used the RNN
implementation provided by Mesnil et al. (2015), which
reports state-of-the-art results for the slot filling task. The
dataset was split into training (68%), validation (17%) and
test (15%) sets. The best accuracy reached during training
was of 80.35%.
Figure 3: Syntactic parse tree for the definition of the con-
cept “Scotch” and assigned semantic role labels. After be- Database classification: the trained classifier was then
ing classified as a differentia event, the VP is further an- used to label all WordNet’s noun and verb definitions. For
alyzed and a PP containing an event location is identified simplicity, example sentences and parentheses were ex-
and assigned its own role label. cluded from the original glosses. The classification was
performed over WordNet 3.0; 82,112 noun definitions and
13,761 verb definitions were labeled.
Data curation: after the automatic pre-annotation, the def- Data post-processing: since some of the classified defini-
initions were manually curated with the aid of the Brat2 an- tions lacked the supertype role, the labeled data had to pass
notation tool. Misclassifications were fixed and segments through a post-processing phase. The supertype is a manda-
missing a role were assigned the appropriate one. Mis- tory component in a well-formed definition and, as will be
classifications and missing roles are due to parser errors detailed later, the RDF model is structured around it. Fol-
or insufficient information (for instance, a PP inside a VP lowing the same syntactic rules adopted for pre-annotation,
may not contain any named entity, making it hard to cor- missing supertypes were identified and the roles around it
rectly distinguish between an event time and an event loca- had its limits adjusted, while the remaining classification
tion). The manual data curation ensured that every segment was kept unchanged. Figure 4 shows an example of defini-
in each definition, apart of leading determiners and con- tion (for the term “spur”) fixed in the post-processing phase.
junctions between roles (as opposed to conjunctions inside RDF conversion: finally, the labeled definitions were seri-
roles), was associated with a semantic role label. alized in RDF format. In the final graph, a synset is a node
and each role in its definition is another node. The synset
2
http://brat.nlplab.org/ node is linked to the supertype role, which is, in turn, linked
3440
5. Application
WordNetGraph is one of the main components in a text
entailment recognition approach aimed at justifying entail-
ment decisions where reasoning over world knowledge is
Figure 4: Classified definition missing a supertype fixed in required. Text entailment is defined as a directional rela-
the post-processing phase. tionship between an entailing text T and a entailed hypothe-
sis H, holding true whenever a human reading T would infer
that H is most likely true (Dagan et al., 2006). Using Word-
to all the other roles. More specifically, a supertype linked NetGraph as the world knowledge base, we implemented
to a role is a reified node, and this reified node is linked a navigation algorithm based on distributional semantics
to the synset node. Reification is also used when a role (Freitas et al., 2014) to find a path in this graph linking T to
has components, such as event time and/or location for a H, and used the contents of the nodes in this path to build
differentia event and quality modifier for a differentia qual- a human-readable justification for the entailment decision.
ity. In this case, the component is linked to its main role, The entailment is rejected if no path is found.
composing a reified node which is linked to the supertype, Consider, as an example the entailment pair 39.3 from the
creating another reified node which is eventually linked to BPI dataset5 :
the synset node. This structure allows the relationships to
be fully contextualized. As an example, consider the defini- 39.3 T: Many cellphones have built-in digital cameras.
tion depicted in Figure 2. The node defined by the concept 39.3 H: Many cellphones can take pictures.
“poet” may be linked to several other nodes in the graph,
but it is linked to the differentia quality node “English” only First, we look for pairs of terms that have a strong semantic
in the context of this definition. Supertype nodes are always relationship and that can prove this entailment to be true,
represented as resources. The differentia quality and differ- and then send these pairs as input for the graph navigation
entia event nodes can be represented as either resources, algorithm. In this example, the best pair is composed by
when they have components (event times and/or locations, the terms “digital camera”, which is our source, and “pic-
or quality modifiers) to be linked to, or literals otherwise. tures”, our target. Starting from the source, we retrieve
All the other roles are represented as literals, and properties all the nodes in WordNetGraph linked to it, compute the
are named after role names3 . Figure 5 shows the simplified semantic similarity between each node and the target and
(without reification) RDF representation for the definition choose the one having the highest value as the next node to
in Figure 2. be visited, and do this recursively until we reach the target.
The following segments (triples) are found by the naviga-
tion algorithm:
3441
6. Conclusion Language to Data Bases/Information Systems, pages 21–
We presented a method for automatically building a graph 32. Springer.
world knowledge base from natural language dictionary Granger, E. H. (1984). Aristotle on genus and differentia.
definitions. Adopting a conceptual model based on entity- Journal of the History of Philosophy, 22(1):1–23.
centered semantic roles, we trained a supervised machine Gunning, D. (2017). Explainable artificial intelligence
learning classifier for automatic role labeling and then con- (XAI). Defense Advanced Research Projects Agency
verted the labeled data into an RDF graph representa- (DARPA).
tion. Following this methodology, we created the Word- Kouylekov, M. and Magnini, B. (2005). Recognizing tex-
NetGraph, a graph built from the definitions of nouns and tual entailment with tree edit distance algorithms. In
verbs in WordNet. A set of tools implementing the method- Proceedings of the First Challenge Workshop Recognis-
ology is also freely available. ing Textual Entailment, pages 17–20.
WordNetGraph was successfully used in a text entailment Lloyd, A. C. (1962). Genus, species and ordered series in
recognition approach based on distributional navigation Aristotle. Phronesis, pages 67–90.
over definition graphs. Besides using paths in this graph Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J. R.,
to recognize the entailment, this approach also provides Bethard, S., and McClosky, D. (2014). The Stanford
a human-readable justification for the entailment decision. Corenlp natural language processing toolkit. In ACL
Since each graph node encloses a self-contained amount of (System Demonstrations), pages 55–60.
information rather than always representing single words, Màrquez, L., Carreras, X., Litkowski, K. C., and Stevenson,
an intelligible justification can be built from a path made S. (2008). Semantic role labeling: an introduction to the
up by only a few nodes. As future work, we intend to ap- special issue. Computational linguistics, 34(2):145–159.
ply this methodology to other language resources, such as Mesnil, G., Dauphin, Y., Yao, K., Bengio, Y., Deng, L.,
Wiktionary. Hakkani-Tur, D., He, X., Heck, L., Tur, G., Yu, D.,
et al. (2015). Using recurrent neural networks for slot
7. Acknowledgments filling in spoken language understanding. IEEE/ACM
Transactions on Audio, Speech and Language Process-
Vivian S. Silva is a CNPq Fellow – Brazil.
ing (TASLP), 23(3):530–539.
Recski, G. (2016). Building concept graphs from mono-
8. Bibliographical References
lingual dictionary entries. In Nicoletta Calzolari (Con-
Berg, J. (1982). Aristotle’s theory of definition. ATTI del ference Chair), et al., editors, Proceedings of the
Convegno Internazionale di Storia della Logica, pages Tenth International Conference on Language Resources
19–30. and Evaluation (LREC 2016). European Language Re-
Bovi, C. D., Telesca, L., and Navigli, R. (2015). Large- sources Association (ELRA).
scale information extraction from textual definitions Silva, V. S., Handschuh, S., and Freitas, A. (2016). Cat-
through deep syntactic and semantic analysis. Transac- egorization of semantic roles for dictionary definitions.
tions of the Association for Computational Linguistics, In Cognitive Aspects of the Lexicon (CogALex-V), Work-
3:529–543. shop at COLING 2016, pages 176–184.
Calzolari, N. (1991). Acquiring and representing semantic Silva, V. S., Freitas, A., and Handschuh, S. (2018). Rec-
information in a lexical knowledge base. In Workshop ognizing and justifying text entailment through distribu-
of SIGLEX (Special Interest Group within ACL on the tional navigation on definition graphs. In AAAI.
Lexicon), pages 235–243. Springer. Vossen, P. and Copestake, A. (1994). Untangling defini-
Copestake, A. (1991). The LKB: a system for represent- tion structure into knowledge representation. In Inher-
ing lexical information extracted from machine-readable itance, defaults and the lexicon, pages 246–274. Cam-
dictionaries. In Proceedings of the ACQUILEX Work- bridge University Press.
shop on Default Inheritance in the Lexicon, Cambridge. Vossen, P. (1991). Converting data from a lexical database
Dagan, I., Glickman, O., and Magnini, B. (2006). The pas- to a knowledge base. Esprit BRA-3030 ACQUILEX
cal recognising textual entailment challenge. In Machine Working Paper No 27.
learning challenges: evaluating predictive uncertainty, Vossen, P. (1992). The automatic construction of a knowl-
visual object classification, and recognising textual en- edge base from dictionaries: a combination of tech-
tailment, pages 177–190. Springer. niques. In EURALEX, volume 92, pages 311–326.
Dolan, W., Vanderwende, L., and Richardson, S. D. (1993). Wang, R. and Neumann, G. (2008). An divide-and-
Automatically deriving structured knowledge bases from conquer strategy for recognizing textual entailment. In
on-line dictionaries. In Proceedings of the First Confer- Proceedings of the Text Analysis Conference, Gaithers-
ence of the Pacific Association for Computational Lin- burg, MD.
guistics, pages 5–14. Pacific Association for Computa-
tional Linguistics Vancouver.
Fellbaum, C. (1998). WordNet. Wiley Online Library.
Freitas, A., da Silva, J. a. C. P., Curry, E., and Buitelaar, P.
(2014). A distributional semantics approach for selec-
tive reasoning on commonsense graph knowledge bases.
In International Conference on Applications of Natural
3442