Building A Knowledge Graph From Natural Language Definitions For Interpretable Text Entailment Recognition

Building a Knowledge Graph from Natural Language Definitions
for Interpretable Text Entailment Recognition
Vivian S. Silva1 , André Freitas2 , Siegfried Handschuh1

1
Department of Computer Science and Mathematics, University of Passau, Innstraße 43, 94032, Passau, Germany
2
School of Computer Science, University of Manchester, Kilburn Building, Oxford Road, M13 9PL, UK
vivian.santossilva@uni-passau.de, andre.freitas@manchester.ac.uk, siegfried.handschuh@uni-passau.de
Abstract
Natural language definitions of terms can serve as a rich source of knowledge, but structuring them into a comprehensible semantic
model is essential to enable them to be used in semantic interpretation tasks. We propose a method and provide a set of tools for
automatically building a graph world knowledge base from natural language definitions. Adopting a conceptual model composed of a
set of semantic roles for dictionary definitions, we trained a classifier for automatically labeling definitions, preparing the data to be
later converted to a graph representation. WordNetGraph, a knowledge graph built out of noun and verb WordNet definitions according
to this methodology, was successfully used in an interpretable text entailment recognition approach which uses paths in this graph to
provide clear justifications for entailment decisions.
Keywords: lexical definitions, knowledge graph, text entailment
1. Introduction 2. Related Work

The construction of structured databases from dictionary
Natural language lexical definitions of terms can be used
definitions has been largely explored, and most approaches
as a source of knowledge in a number of semantic tasks,
rely on syntactic parsers for the identification of patterns
such as Question Answering, Information Extraction and
that point to relationships between words (Calzolari, 1991;
Text Entailment. While formal, structured resources such
Vossen, 1991; Vossen, 1992; Vossen and Copestake, 1994).
as ontologies are still scarce and usually target a very spe-
Among early efforts, it is remarkable the creation of LKB,
cific domain, a large number of linguistic resources gather-
a Lexical Knowledge Base (Copestake, 1991) based on
ing dictionary definitions is available not only for particular
typed-feature structures that can be seen as a set of at-
domains, but also addressing wide-coverage commonsense
tributes for a given concept, such as “origin”, “color”,
knowledge.
“smell”, “taste” and “temperature” for the concept drink,
However, in order to make the most of those resources, it is for example. The definitions from a machine-readable dic-
necessary to capture the semantic shape of natural language tionary are parsed to extract the definiendum’s genus and
definitions and structure them in a way that favors both the differentiae, and the values represented by the differentiae
information extraction process and the subsequent informa- will fill in the feature structures for that genus. Since the
tion retrieval, allowing the effective construction of seman- features, that is, the relevant attributes for a given entity,
tic models from these data sources while keeping the result- must be defined in advance, only a restricted domain was
ing model easily searchable and interpretable. Furthermore, considered in their approach.
by using these models, systems can increase their own in- Dolan et al. (1993) also describe an automated strategy
terpretability, benefiting from the structured data for per- to build a structured lexical knowledge base but, instead
forming traceable reasoning and generating explanations – of the entity-attributes structure, they use syntactic pars-
features which are becoming even more valuable given the ing to identify semantic relations such as is-a, part-of, etc.,
growing importance of Explainable AI (Gunning, 2017). to build a directed graph. Recski (2016) also derives a
In this work, we propose a method for automatically build- graph representation from dictionary definitions, but in the
ing commonsense knowledge bases out of natural language adopted conceptual model there are only three types of
dictionary definitions, which is easily extensible to any edges, numbered from 0 to 2: the 0-edge represents unary
domain where natural language glossaries are available. predicates and the 1 and 2-edges connects binary predicates
Building upon a conceptual model based on a set of seman- to their arguments. In common, most approaches work at
tic roles for definitions, we classify each segment in a def- the word-level, converting each single word in the defini-
inition according to its relation to the entity being defined, tion into a different attribute or node. In the graph knowl-
and convert the classified data into a knowledge graph edge base scenario, this can increase the information re-
where each node is a meaningful phrase which contains a trieval complexity, given that it may be necessary to con-
piece of self-contained information about the definiendum. catenate the contents of several nodes to obtain meaningful
Following this methodology, we processed the whole noun enough information about an entity.
and verb databases of WordNet (Fellbaum, 1998) and built The work proposed by (Bovi et al., 2015) go beyond the
the WordNetGraph, and then used this knowledge graph to word-level representation, being able to identify multi-
recognize text entailments in an interpretable way, provid- word expressions. They perform a syntactic-semantic anal-
ing concise justifications for the entailment decisions. ysis of textual definitions for Open Information Extraction
3438
(OIE). Although they generate a syntactic-semantic graph Role Description
representation of the definitions, the resulting graphs are Supertype the immediate or ancestral en-
used only as an intermediary resource for the final goal of tity’s superclass
extracting semantic relations between the entities present in Differentia quality a quality that distinguishes the
the definition. entity from the others under the
same supertype
3. Graph Conceptual Model Differentia event an event (action, state or pro-
cess) in which the entity par-
To build the definition graph, we adopted the conceptual ticipates and that is mandatory
model proposed by Silva et al. (2016). This model ex- to distinguish it from the others
tends the genus-differentia definition pattern from Aristo- under the same supertype
tle’s classic theory of definition (Berg, 1982; Lloyd, 1962; Event location the location of a differentia
Granger, 1984) by defining a set of entity-centered semantic event
roles for lexical definitions. Differently from the commonly Event time the time in which a differentia
used event-centered semantic roles, which define the se- event happens
mantic relations holding among a predicate (the main verb Origin location the entity’s location of origin
in a clause) and its associated participants and properties Quality modifier degree, frequency or manner
(Màrquez et al., 2008), definition’s semantic roles express modifiers that constrain a dif-
the part played by an expression in a definition, showing ferentia quality
how it relates to the definiendum, that is, the entity being Purpose the main goal of the entity’s ex-
defined. istence or occurrence
In this model, the genus concept was replaced by the more Associated fact a fact whose occurrence is/was
general role supertype, which can be not only the definien- linked to the entity’s existence
dum’s immediate superclass but also an ancestor higher or occurrence
in the concepts hierarchy. The differentia component was Accessory determiner a determiner expression
split into two roles: differentia quality and differentia event. that doesn’t constrain the
These three roles can be seen as the representatives of an supertype-differentia scope
entity’s essential properties, while other roles, such as as- Accessory quality a quality that is not essential to
sociated fact, purpose or accessory quality, for example, characterize the entity
define non-essential properties. The conceptual model is [Role] particle a particle, such as a phrasal
depicted in Figure 1, and Table 1 presents a summarized verb complement, non-
description for each of the roles defined in this model. contiguous to the other role
This set of semantic roles captures the semantic “shape” components
of natural language definitions and allows the extraction
of structured representations from linguistic resources, en- Table 1: Semantic roles for dictionary definitions
abling them to be used as knowledge sources in a wide
range of semantic tasks.
verb glosses, we adopted the following methodology for
4. Construction Methodology classifying and structuring the definitions:
Synsets sample selection: in order to use a supervised ma-
Structuring natural language definitions as a graph allows chine learning model to classify the data, we needed a ini-
us to select the portions of information regarding an entity’s tial set of annotated definitions. To build this set, we ran-
description that are relevant for a certain reasoning task. domly selected 2,000 WordNet synsets, being 1,732 noun
For example, consider the definition (from WordNet) for synsets and 268 verb synsets (the verb database size is
the concept “lake poets”, which was classified according around 17% of the noun database size).
to the model described in Section 3., illustrated in Figure Automatic pre-annotation: the set of 2,000 definitions
2. When retrieving data related to this concept, we could was automatically pre-annotated according to a rule-based
be interested only in origin- (lake poets are English poets), heuristic that takes into account the syntactic patterns iden-
time- (lake poets are poets at the beginning of the 19th cen- tified by statistical analysis as described by Silva et al.
tury) or space- (lake poets are poets who lived in the Lake (2016). Using the Stanford parser (Manning et al., 2014),
District) related information. When each of those roles is we generated the syntactic parse tree for each definition,
represented as a node in a graph we can focus only on the identified the relevant phrasal nodes and then assigned the
path containing the nodes of interest. Moreover, since the semantic roles more often associated to them. For exam-
definition is split into segments rather than single words, ple: the supertype for a noun definition is usually the inner-
each node contains a comprehensible amount of informa- most and leftmost noun phrase (NP) that contains at least
tion, avoiding the need to visit several nodes to gather in- one noun (NN); a differentia event is usually either a sub-
telligible phrases. ordinate clause (SBAR) or a verb phrase (VP); an event
To generate the WordNetGraph1 – a knowledge graph fol- location is normally a prepositional phrase (PP) inside a
lowing the RDF data model – from WordNet’s noun and SBAR or VP and possibly containing a location named en-
tity, and so on. Figure 3 shows the parse tree generated for
1
https://github.com/Lambda-3/WordnetGraph the definition of the term “Scotch” – whiskey distilled in
3439
Figure 1: Conceptual model for the semantic roles for lexical definitions. Relationships between [role] particle and every
other role in the model are expressed as dashed lines for readability.
Figure 2: Example of role labeling for the definition of the “lake poets” synset.
Scotland – and the semantic roles automatically assigned Classifier training: the curated data was then used to
to each phrasal node. train a Recurrent Neural Network (RNN) machine learning
model designed for sequence labeling. We used the RNN
implementation provided by Mesnil et al. (2015), which
reports state-of-the-art results for the slot filling task. The
dataset was split into training (68%), validation (17%) and
test (15%) sets. The best accuracy reached during training
was of 80.35%.
Figure 3: Syntactic parse tree for the definition of the con-
cept “Scotch” and assigned semantic role labels. After be- Database classification: the trained classifier was then
ing classified as a differentia event, the VP is further an- used to label all WordNet’s noun and verb definitions. For
alyzed and a PP containing an event location is identified simplicity, example sentences and parentheses were ex-
and assigned its own role label. cluded from the original glosses. The classification was
performed over WordNet 3.0; 82,112 noun definitions and
13,761 verb definitions were labeled.
Data curation: after the automatic pre-annotation, the def- Data post-processing: since some of the classified defini-
initions were manually curated with the aid of the Brat2 an- tions lacked the supertype role, the labeled data had to pass
notation tool. Misclassifications were fixed and segments through a post-processing phase. The supertype is a manda-
missing a role were assigned the appropriate one. Mis- tory component in a well-formed definition and, as will be
classifications and missing roles are due to parser errors detailed later, the RDF model is structured around it. Fol-
or insufficient information (for instance, a PP inside a VP lowing the same syntactic rules adopted for pre-annotation,
may not contain any named entity, making it hard to cor- missing supertypes were identified and the roles around it
rectly distinguish between an event time and an event loca- had its limits adjusted, while the remaining classification
tion). The manual data curation ensured that every segment was kept unchanged. Figure 4 shows an example of defini-
in each definition, apart of leading determiners and con- tion (for the term “spur”) fixed in the post-processing phase.
junctions between roles (as opposed to conjunctions inside RDF conversion: finally, the labeled definitions were seri-
roles), was associated with a semantic role label. alized in RDF format. In the final graph, a synset is a node
and each role in its definition is another node. The synset
2
http://brat.nlplab.org/ node is linked to the supertype role, which is, in turn, linked
3440
5. Application
WordNetGraph is one of the main components in a text
entailment recognition approach aimed at justifying entail-
ment decisions where reasoning over world knowledge is
Figure 4: Classified definition missing a supertype fixed in required. Text entailment is defined as a directional rela-
the post-processing phase. tionship between an entailing text T and a entailed hypothe-
sis H, holding true whenever a human reading T would infer
that H is most likely true (Dagan et al., 2006). Using Word-
to all the other roles. More specifically, a supertype linked NetGraph as the world knowledge base, we implemented
to a role is a reified node, and this reified node is linked a navigation algorithm based on distributional semantics
to the synset node. Reification is also used when a role (Freitas et al., 2014) to find a path in this graph linking T to
has components, such as event time and/or location for a H, and used the contents of the nodes in this path to build
differentia event and quality modifier for a differentia qual- a human-readable justification for the entailment decision.
ity. In this case, the component is linked to its main role, The entailment is rejected if no path is found.
composing a reified node which is linked to the supertype, Consider, as an example the entailment pair 39.3 from the
creating another reified node which is eventually linked to BPI dataset5 :
the synset node. This structure allows the relationships to
be fully contextualized. As an example, consider the defini- 39.3 T: Many cellphones have built-in digital cameras.
tion depicted in Figure 2. The node defined by the concept 39.3 H: Many cellphones can take pictures.
“poet” may be linked to several other nodes in the graph,
but it is linked to the differentia quality node “English” only First, we look for pairs of terms that have a strong semantic
in the context of this definition. Supertype nodes are always relationship and that can prove this entailment to be true,
represented as resources. The differentia quality and differ- and then send these pairs as input for the graph navigation
entia event nodes can be represented as either resources, algorithm. In this example, the best pair is composed by
when they have components (event times and/or locations, the terms “digital camera”, which is our source, and “pic-
or quality modifiers) to be linked to, or literals otherwise. tures”, our target. Starting from the source, we retrieve
All the other roles are represented as literals, and properties all the nodes in WordNetGraph linked to it, compute the
are named after role names3 . Figure 5 shows the simplified semantic similarity between each node and the target and
(without reification) RDF representation for the definition choose the one having the highest value as the next node to
in Figure 2. be visited, and do this recursively until we reach the target.
The following segments (triples) are found by the naviga-
tion algorithm:
<digital camera has supertype camera>

<camera has supertype equipment>
<equipment has diff qual for taking photographs>
Since “photograph” and “picture” are in the same synset

node, the search stops at this point, confirming the entail-
ment and producing the following justification, built from
the path segments:
Figure 5: RDF representation for the definition of the “lake
poets” synset. A digital camera is a kind of camera
A camera is an equipment for taking photographs
Photograph is synonym of picture
Besides WordNetGraph, which is available in both XML
and N-Triples format, we provide a set of tools4 that im- Experiments with the BPI dataset and a sample of the
plement the methodology described above. Routines for Guardian Headlines dataset6 show the results are compara-
pre-processing definitions to generate sample data for man- ble to those of well-established text entailment algorithms,
ual curation, post-processing data returned by a machine such as tree edit-distance based (Kouylekov and Magnini,
learning classifier, and generating the RDF model from the 2005) and classification based (Wang and Neumann, 2008),
classified data are freely available, along with some auxil- while providing clear human-like explanations, an impor-
iary routines to prepare the data for external tools, such as tant feature still missing in most text entailment recognition
converting to the standoff file format required by the Brat approaches. A detailed description of the entailment recog-
annotation tool and generating a python script that will cre- nition application, including experiment results and further
ate the dataset for the RNN classifier. justification examples can be found in (Silva et al., 2018).
3
Complete list of the model’s properties and namespaces at
5
https://github.com/Lambda-3/WordnetGraph http://www.cs.utexas.edu/users/∼pclark/bpi-test-suite/
4 6
https://github.com/ssvivian/DefRelExtractor https://goo.gl/4iHdbX
3441
6. Conclusion Language to Data Bases/Information Systems, pages 21–
We presented a method for automatically building a graph 32. Springer.
world knowledge base from natural language dictionary Granger, E. H. (1984). Aristotle on genus and differentia.
definitions. Adopting a conceptual model based on entity- Journal of the History of Philosophy, 22(1):1–23.
centered semantic roles, we trained a supervised machine Gunning, D. (2017). Explainable artificial intelligence
learning classifier for automatic role labeling and then con- (XAI). Defense Advanced Research Projects Agency
verted the labeled data into an RDF graph representa- (DARPA).
tion. Following this methodology, we created the Word- Kouylekov, M. and Magnini, B. (2005). Recognizing tex-
NetGraph, a graph built from the definitions of nouns and tual entailment with tree edit distance algorithms. In
verbs in WordNet. A set of tools implementing the method- Proceedings of the First Challenge Workshop Recognis-
ology is also freely available. ing Textual Entailment, pages 17–20.
WordNetGraph was successfully used in a text entailment Lloyd, A. C. (1962). Genus, species and ordered series in
recognition approach based on distributional navigation Aristotle. Phronesis, pages 67–90.
over definition graphs. Besides using paths in this graph Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J. R.,
to recognize the entailment, this approach also provides Bethard, S., and McClosky, D. (2014). The Stanford
a human-readable justification for the entailment decision. Corenlp natural language processing toolkit. In ACL
Since each graph node encloses a self-contained amount of (System Demonstrations), pages 55–60.
information rather than always representing single words, Màrquez, L., Carreras, X., Litkowski, K. C., and Stevenson,
an intelligible justification can be built from a path made S. (2008). Semantic role labeling: an introduction to the
up by only a few nodes. As future work, we intend to ap- special issue. Computational linguistics, 34(2):145–159.
ply this methodology to other language resources, such as Mesnil, G., Dauphin, Y., Yao, K., Bengio, Y., Deng, L.,
Wiktionary. Hakkani-Tur, D., He, X., Heck, L., Tur, G., Yu, D.,
et al. (2015). Using recurrent neural networks for slot
7. Acknowledgments filling in spoken language understanding. IEEE/ACM
Transactions on Audio, Speech and Language Process-
Vivian S. Silva is a CNPq Fellow – Brazil.
ing (TASLP), 23(3):530–539.
Recski, G. (2016). Building concept graphs from mono-
8. Bibliographical References
lingual dictionary entries. In Nicoletta Calzolari (Con-
Berg, J. (1982). Aristotle’s theory of definition. ATTI del ference Chair), et al., editors, Proceedings of the
Convegno Internazionale di Storia della Logica, pages Tenth International Conference on Language Resources
19–30. and Evaluation (LREC 2016). European Language Re-
Bovi, C. D., Telesca, L., and Navigli, R. (2015). Large- sources Association (ELRA).
scale information extraction from textual definitions Silva, V. S., Handschuh, S., and Freitas, A. (2016). Cat-
through deep syntactic and semantic analysis. Transac- egorization of semantic roles for dictionary definitions.
tions of the Association for Computational Linguistics, In Cognitive Aspects of the Lexicon (CogALex-V), Work-
3:529–543. shop at COLING 2016, pages 176–184.
Calzolari, N. (1991). Acquiring and representing semantic Silva, V. S., Freitas, A., and Handschuh, S. (2018). Rec-
information in a lexical knowledge base. In Workshop ognizing and justifying text entailment through distribu-
of SIGLEX (Special Interest Group within ACL on the tional navigation on definition graphs. In AAAI.
Lexicon), pages 235–243. Springer. Vossen, P. and Copestake, A. (1994). Untangling defini-
Copestake, A. (1991). The LKB: a system for represent- tion structure into knowledge representation. In Inher-
ing lexical information extracted from machine-readable itance, defaults and the lexicon, pages 246–274. Cam-
dictionaries. In Proceedings of the ACQUILEX Work- bridge University Press.
shop on Default Inheritance in the Lexicon, Cambridge. Vossen, P. (1991). Converting data from a lexical database
Dagan, I., Glickman, O., and Magnini, B. (2006). The pas- to a knowledge base. Esprit BRA-3030 ACQUILEX
cal recognising textual entailment challenge. In Machine Working Paper No 27.
learning challenges: evaluating predictive uncertainty, Vossen, P. (1992). The automatic construction of a knowl-
visual object classification, and recognising textual en- edge base from dictionaries: a combination of tech-
tailment, pages 177–190. Springer. niques. In EURALEX, volume 92, pages 311–326.
Dolan, W., Vanderwende, L., and Richardson, S. D. (1993). Wang, R. and Neumann, G. (2008). An divide-and-
Automatically deriving structured knowledge bases from conquer strategy for recognizing textual entailment. In
on-line dictionaries. In Proceedings of the First Confer- Proceedings of the Text Analysis Conference, Gaithers-
ence of the Pacific Association for Computational Lin- burg, MD.
guistics, pages 5–14. Pacific Association for Computa-
tional Linguistics Vancouver.
Fellbaum, C. (1998). WordNet. Wiley Online Library.
Freitas, A., da Silva, J. a. C. P., Curry, E., and Buitelaar, P.
(2014). A distributional semantics approach for selec-
tive reasoning on commonsense graph knowledge bases.
In International Conference on Applications of Natural
3442

Building A Knowledge Graph From Natural Language Definitions For Interpretable Text Entailment Recognition

Uploaded by

Copyright:

Available Formats

Building A Knowledge Graph From Natural Language Definitions For Interpretable Text Entailment Recognition

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Building A Knowledge Graph From Natural Language Definitions For Interpretable Text Entailment Recognition

Uploaded by

Copyright:

Available Formats

Building a Knowledge Graph from Natural Language Definitions

for Interpretable Text Entailment Recognition

Vivian S. Silva1 , André Freitas2 , Siegfried Handschuh1

Keywords: lexical definitions, knowledge graph, text entailment

1. Introduction 2. Related Work

<digital camera has supertype camera>

Since “photograph” and “picture” are in the same synset

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.