Natural Language Processing

Module -5
Natural Language Processing

5.1. Steps in the Process
• Morphological Analysis: Individual words are analyzed into their components and
nonword tokens such as punctuation are separated from the words.
• Syntactic Analysis: Linear sequences of words are transformed into structures that show
how the words relate to each other.
• Semantic Analysis: The structures created by the syntactic analyzer are assigned
meanings.
• Discourse integration: The meaning of an individual sentence may depend on the

sentences that precede it and may influence the meanings of the sentences that follow it.
• Pragmatic Analysis: The structure representing what was said is reinterpreted to

determine what was actually meant.
Morphological Analysis
• Suppose we have an english interface to an operating system and the following sentence
is typed:
– I want to print Bill’s .init file.
• Morphological analysis must do the following things:
– Pull apart the word “Bill’s” into proper noun “Bill” and the possessive suffix “’s”
– Recognize the sequence “.init” as a file extension that is functioning as an

adjective in the sentence.
• This process will usually assign syntactic categories to all the words in the sentece.
• Consider the word “prints”. This word is either a plural noun or a third person singular
verb ( he prints ).
Syntactic Analysis
• Syntactic analysis must exploit the results of morphological analysis to build a structural
description of the sentence.
• The goal of this process, called parsing, is to convert the flat list of words that forms the
sentence into a structure that defines the units that are represented by that flat list.
1
• The important thing here is that a flat sentence has been converted into a hierarchical
structure and that the structure correspond to meaning units when semantic analysis is
performed.
• Reference markers are shown in the parenthesis in the parse tree
• Each one corresponds to some entity that has been mentioned in the sentence.
Semantic Analysis
• Semantic analysis must do two important things:
– It must map individual words into appropriate objects in the knowledge base or
database
– It must create the correct structures to correspond to the way the meanings of the
individual words combine with each other.
Discourse Integration
• Specifically we do not know whom the pronoun “I” or the proper noun “Bill” refers to.
• To pin down these references requires an appeal to a model of the current discourse
context, from which we can learn that the current user is USER068 and that the only
person named “Bill” about whom we could be talking is USER073.
• Once the correct referent for Bill is known, we can also determine exactly which file is
being referred to.
Pragmatic Analysis
• The final step toward effective understanding is to decide what to do as a results.
2
• One possible thing to do is to record what was said as a fact and be done with it.
• For some sentences, whose intended effect is clearly declarative, that is precisely correct
thing to do.
• But for other sentences, including ths one, the intended effect is different.
• We can discover this intended effect by applyling a set of rules that characterize
cooperative dialogues.
• The final step in pragmatic processing is to translate, from the knowledge based
representation to a command to be executed by the system.
• The results of the understanding process is
• Lpr /wsmith/stuff.init
5.2 Syntactic Processing:
• Syntactic Processing is the step in which a flat input sentence is converted into a
hierarchical structure that corresponds to the units of meaning in the sentence.
• This process is called parsing.
• It plays an important role in natural language understanding systems for two reasons:
– Semantic processing must operate on sentence constituents. If there is no syntactic

parsing step, then the semantics system must decide on its own constituents. If
parsing is done, on the other hand, it constrains the number of constituents that
semantics can consider. Syntactic parsing is computationally less expensive than
is semantic processing. Thus it can play a significant role in reducing overall
system complexity.
– Although it is often possible to extract the meaning of a sentence without using

grammatical facts, it is not always possible to do so. Consider the examples:
• The satellite orbited Mars
• Mars orbited the satellite
In the second sentence, syntactic facts demand an interpretation in which a planet revolves
around a satellite, despite the apparent improbability of such a scenerio.
• Almost all the systems that are actually used have two main components:
– A declarative representation, called a grammar, of the syntactic facts about the

language.
3
– A procedure, called parser, that compares the grammar against input sentences to
produce parsed structures.
5.2.1 Grammars and Parsers:
• The most common way to represent grammars is as a set of production rules.
• A simple Context-fre phrase structure grammar fro English:
– S → NP VP
– NP → the NP1
– NP → PRO
– NP → PN
– NP → NP1
– NP1 → ADJS N
– ADJS → ε | ADJ ADJS
– VP → V
– VP → V NP
– N → file | printer
– PN → Bill
– PRO → I
– ADJ → short | long | fast
– V → printed | created | want
• First rule can be read as “ A sentence is composed of a noun phrase followed by Verb
Phrase”; Vertical bar is OR ; ε represnts empty string.
• Symbols that are further expanded by rules are called nonterminal symbols.
• Symbols that correspond directly to strings that must be found in an input sentence are
called terminal symbols.
• Grammar formalism such as this one underlie many linguistic theories, which in turn
provide the basis for many natural language understanding systems.
• Pure context free grammars are not effective for describing natural languages.
4
• NLPs have less in common with computer language processing systems such as
compilers.
• Parsing process takes the rules of the grammar and compares them against the input
sentence.
• The simplest structure to build is a Parse Tree, which simply records the rules and how
they are matched.
• Every node of the parse tree corresponds either to an input word or to a nonterminal in
our grammar.
• Each level in the parse tree corresponds to the application of one grammar rule.
Fig ; A Parse tree for a sentence
What grammar specifies about language?
• Its weak generative capacity, by which we mean the set of sentences that are contained
within the language. This set is made up of precisely those sentences that can be
completely matched by a series of rules in the grammar.
• Its strong generative capacity, by which we mean the structure to be assigned to each
grammatical sentence of the language.
• All Paths – Follow all possible path and build all the posiible intermediate components.
• Best Path with Backtracking – Follow only one path at a time, but record, at every choice
point, the information that is necessary to make another choice if the chosen path fails to
lead to a complete interpretation of the sentence.
• Best Path with Patchup – Follow only one path at a time, but when an error is detected,
explicitly shuffle around the components that have already been formed.
5
• Wait and See – Follow only one path, but rather than making decisions about the function
of each component at it is encountered, procrastinate the decision until enough
information is available to make the decision correctly.
5.2.2. Augmented Transition Network
• Augmented Transition Network is a top-down parsing procedure that allows various

kinds of knowledge to be incorporated into the parsing system so it can operated
efficiently.
• ATN in graphical notation :
“The long file has printed”
• This execution proceeds as follows :
1. Begin in state S.
2. Push to NP.
3. Do a category test to see if “the” is a determiner.
4. This test succeeds, so set the DETERMINER register to DEFINITE and go to state Q6.
5. Do a category test to see if “long” is an adjective
6. This test succeeds, so append “long” to the list contained in the ADJS register. (This list
was previously empty). Stay in state Q6.
7. Do a category test to see if “file” is an adjective. This test fails.
8. Do a category test to see if “file” is a noun. This test succeeds, so set the NOUN register
to “file” and go to state Q7.
9. Push to PP.
10. Do a category test to see if “has” is a preposition. This test fails, so pop and signal failure.
11. There is nothing else that can be done from state Q7, so pop and return the structure ( NP (
FILE ( LONG ) DEFINITE ))
11. The return causes the machine to be in state Q1, with the SUBJ register set to the
structure just returned and the type register set to DCL.
12. Do a category test to see if “has” is a verb. This test succeeds, so set the AUX register to
NIL and set the V register to “has”. Go to state Q4.
6
13. Push to state NP. Since the next word, “printed”, is not determiner or proper noun, NP
will pop and return failure.
14. The only other thing to do in state Q4 is to halt. But more input remains, so a complete
parse has not been found. Backtracking is now required.
15. The last choice point was at state Q1, so return there. The register AUX and V must be
unset.
16. Do a category test to see if “has” is an auxiliary. This test succeeds, so set the AUX
register to “has” and go to state Q3.
17. Do a category test to see if “printed” is a verb. This test succeeds, so set the V register to
“printed”. Go to state Q4.
18. Now, since the input is exhausted, Q4 is acceptable final state. Pop and return the
structure ( S DCL (NP ( FILE ( LONG ) DEFINITE )) HAS( VP PRINTED)
This structure is the output of the parse.
Fig: An ATN Network for a Fragment of English
5.2.3. Unification Grammars
• Purely declarative representations
• Unification simultaneously performs two operations:
7
– Matching
– Structure building, by combining constituents
• Think of graphs as sets not lists, i.e., order doesn’t matter.
• Lexical items as graphs:
1. If either G1 or G2 is an attribute that is not itself an attribute-value pair then :
a. If the attributes conflict (as defined above), then fail.
b. If either is a variable, then bind it to the value of the other and return that value.
c. Otherwise, return the most general value that is consistent with both the original
values. Specifically, is disjunction is allowed, then return the intersection of the
values.
2. Otherwise, do :
a. Set variable NEW to empty.
b. For each attribute A that is present (at the top level) in either G1 or G2 do :
(i) If A is not present at the top level in the other input, then add A its value to NEW
(ii) If it is, then call Graph-Unify with the two values for A. If that fail, then fail.
Otherwise, take the new value of A to be the result of that unification and add A
with is value to NEW.
c. If there are any labels attached to G1 or G2, then bind them to NEW and return NEW.
5.3 Semantic Analysis:
• Producing a syntactic parse of a sentence is only the first step toward understanding it.
• We must still produce a representation of the meaning of the sentence.
8
• Because understanding is a mapping process, we must first define the language into
which we are trying to map.
• There is no single definitive language in which all sentence meaning can be described.
• The choice of a target language for any particular natural language understanding
program must depend on what is to be done with the meanings once they are constructed.
Choice of target language in semantic Analysis
• There are two broad families of target languages that are used in NL systems, depending
on the role that the natural language system is playing in a larger system:
– When natural language is being considered as a phenomenon on its own, as for

example when one builds a program whose goal is to read text and then answer
questions about it, a target language can be designed specifically to support
language processing.
– When natural language is being used as an interface language to another program(

such as a db query system or an expert system), then the target language must be
legal input to that other program. Thus the design of the target language is driven
by the backend program.
Lexical processing
• The first step in any semantic processing system is to look up the individual words in a
dictionary ( or lexicon) and extract their meanings.
• Many words have several meanings, and it may not be possible to choose the correct one
just by looking at the word itself.
• The process of determining the correct meaning of an individual word is called word
sense disambiguation or lexical disambiguation.
• It is done by associating, with each word in lexicon, information about the contexts in
which each of the word’s senses may appear.
• Sometimes only very straightforward info about each word sense is necessary. For
example, baseball field interpretation of diamond could be marked as a LOCATION.
• Some useful semantic markers are :
– PHYSICAL-OBJECT
– ANIMATE-OBJECT
9
– ABSTRACT-OBJECT
Sentence-Level Processing:
• Several approaches to the problem of creating a semantic representation of a sentence

have been developed, including the following:
– Semantic grammars, which combine syntactic, semantic and pragmatic

knowledge into a single set of rules in the form of grammar.
– Case grammars, in which the structure that is built by the parser contains some
semantic information, although further interpretation may also be necessary.
– Conceptual parsing in which syntactic and semantic knowledge are combined into
a single interpretation system that is driven by the semantic knowledge.
– Approximately compositional semantic interpretation, in which semantic

processing is applied to the result of performing a syntactic parse
5.3.1 Semantic Grammar:
• A semantic grammar is a context-free grammar in which the choice of nonterminals and

production rules is governed by semantic as well as syntactic function.
• There is usually a semantic action associated with each grammar rule.
• The result of parsing and applying all the associated semantic actions is the meaning of
the sentence.
10
Advantages of Semantic grammars:
• When the parse is complete, the result can be used immediately without the additional
stage of processing that would be required if a semantic interpretation had not already
been performed during the parse.
• My ambiguities that would arise during a strictly syntactic parse can be avoided since
some of the interpretations do not make sense semantically and thus cannot be generated
by a semantic grammar.
• Syntactic issues that do not affect the semantics can be ignored.
• The drawbacks of use of semantic grammars are:
– The number of rules required can become very large since many syntactic
generalizations are missed.
– Because the number of grammar rules may be very large, the parsing process
may be expensive.
11
5.3.2 Case grammars
• Case grammars provide a different approach to the problem of how syntactic and sematic
interpretation can be combined.
• Grammar rules are written to describe syntactic rather than semantic regularities.
• But the structures the rules produce correspond to semantic relations rather than to
strictly syntactic ones
• Consider two sentences:
– Susan printed the file.
– The file was printed by susan.
• The case grammar interpretation of the two sentences would both be :
( printed ( agent Susan)
( object File ))
(baked(agent mother)
(timeperiod 3-hours)
The second can be interpreted as
(baked(agent pie)
12
(timeperiod 3-hours)
• Agent – instigator of the action
• Instrument – cause of the event or object used in the event (typically inanimate)
• Dative – entity affected by the action (typically animate)
• Factitive – object or being resulting from the event
• Locative – place of the event
• Source – place from which something moves
• Goal – place to which something moves
• Beneficiary – being on whose behalf the event occurred (typically animate)
• Time – time the event occurred
• Object – entity acted upon or that is changed
– To kill: [agent instrument (object) (dative) {locative time}]
– To run: [agent (locative) (time) (source) (goal)]
– To want: [agent object (beneficiary)]
Conceptual Parsing:
• Conceptual parsing is a strategy for finding both the structure and meaning of a sentence
in one step.
• Conceptual parsing is driven by dictionary that describes the meaning of words in

conceptual dependency (CD) structures.
• The parsing is similar to case grammar.
• CD usually provides a greater degree of predictive power.
13

Natural Language Processing

Uploaded by

Copyright:

Available Formats

Natural Language Processing

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Natural Language Processing

Uploaded by

Copyright:

Available Formats

Module -5

Natural Language Processing

• Discourse integration: The meaning of an individual sentence may depend on the

• Pragmatic Analysis: The structure representing what was said is reinterpreted to

– I want to print Bill’s .init file.

• Morphological analysis must do the following things:

– Recognize the sequence “.init” as a file extension that is functioning as an

• Reference markers are shown in the parenthesis in the parse tree

• Semantic analysis must do two important things:

• The final step toward effective understanding is to decide what to do as a results.

• The results of the understanding process is

5.2 Syntactic Processing:

• This process is called parsing.

– Semantic processing must operate on sentence constituents. If there is no syntactic

– Although it is often possible to extract the meaning of a sentence without using

• The satellite orbited Mars

• Mars orbited the satellite

– A declarative representation, called a grammar, of the syntactic facts about the

5.2.1 Grammars and Parsers:

• The most common way to represent grammars is as a set of production rules.

• A simple Context-fre phrase structure grammar fro English:

Fig ; A Parse tree for a sentence

What grammar specifies about language?

5.2.2. Augmented Transition Network

• Augmented Transition Network is a top-down parsing procedure that allows various

• ATN in graphical notation :

“The long file has printed”

• This execution proceeds as follows :

3. Do a category test to see if “the” is a determiner.

5. Do a category test to see if “long” is an adjective

7. Do a category test to see if “file” is an adjective. This test fails.

This structure is the output of the parse.

Fig: An ATN Network for a Fragment of English

5.2.3. Unification Grammars

• Purely declarative representations

• Unification simultaneously performs two operations:

– Structure building, by combining constituents

• Think of graphs as sets not lists, i.e., order doesn’t matter.

• Lexical items as graphs:

1. If either G1 or G2 is an attribute that is not itself an attribute-value pair then :

a. If the attributes conflict (as defined above), then fail.

a. Set variable NEW to empty.

5.3 Semantic Analysis:

• We must still produce a representation of the meaning of the sentence.

Choice of target language in semantic Analysis

– When natural language is being considered as a phenomenon on its own, as for

– When natural language is being used as an interface language to another program(

• Some useful semantic markers are :

• Several approaches to the problem of creating a semantic representation of a sentence

– Semantic grammars, which combine syntactic, semantic and pragmatic

– Approximately compositional semantic interpretation, in which semantic

5.3.1 Semantic Grammar:

• A semantic grammar is a context-free grammar in which the choice of nonterminals and

• There is usually a semantic action associated with each grammar rule.

• Syntactic issues that do not affect the semantics can be ignored.

• The drawbacks of use of semantic grammars are:

• Consider two sentences:

– Susan printed the file.

– The file was printed by susan.