NLP Ambiguity
NLP Ambiguity
NLP Ambiguity
Ambiguity
1
NLP Tasks
• NLP applications require several NLP analyses:
– Word tokenization
– Sentence boundary detection
– Part-of-speech (POS) tagging
• to identify the part-of-speech (e.g. noun, verb) of each word
– Named Entity (NE) recognition
• to identify proper nouns (e.g. names of person, location,
organization; domain terminologies)
– Parsing
• to identify the syntactic structure of a sentence
– Semantic analysis
• to derive the meaning of a sentence
2
1. Part-Of-Speech (POS) Tagging
• POS tagging is a process of assigning a POS or lexical
class marker to each word in a sentence (and all
sentences in a corpus).
3
Syntactic Analysis - Grammar
• sentence -> noun_phrase, verb_phrase
• noun_phrase -> proper_noun
• noun_phrase -> determiner, noun
• verb_phrase -> verb, noun_phrase
• proper_noun -> [mary]
• noun -> [apple]
• verb -> [ate]
• determiner -> [the]
5
2. Named Entity Recognition (NER)
• NER is to process a text and identify named entities in a
sentence
– e.g. “U.N. official Ekeus heads for Baghdad.”
6
3. Shallow Parsing
• Shallow (or Partial) parsing identifies the (base) syntactic phases in
a sentence.
8
Source: J. Choi, CSE842, MSU
9
Bridgestone Sports Co. said Friday it had set up a joint venture in Taiwan with a
local concern and a Japanese trading house to produce golf clubs to be supplied
to Japan.
The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new
Taiwan dollars, will start production in January 1990 with production of 20,000
iron and “metal wood” clubs a month.
template filling
TIE-UP-1 ACTIVITY-1
Relationship: TIE-UP Activity: PRODUCTION
Entities: “Bridgestone Sport Co.” Company:
“a local concern” “Bridgestone Sports Taiwan Co.”
“a Japanese trading house” Product:
Joint Venture Company: “iron and ‘metal wood’ clubs”
“Bridgestone Sports Taiwan Co.” Start Date:
Activity: ACTIVITY-1 DURING: January 1990
Amount: NT$200000000
10
But NLP very is hard..
• Understanding natural languages is hard …
because of inherent ambiguity
• Engineering NLP systems is also hard …
because of:
– Huge amount of data resources needed (e.g.
grammar, dictionary, documents to extract
statistics from)
– Computational complexity (intractable) of
analyzing a sentence
11
Why NL Understanding is hard?
• Natural language is extremely rich in form and structure,
and very ambiguous.
– How to represent meaning,
– Which structures map to which meaning structures.
• One input can mean many different things. Ambiguity
can be at different levels.
– Lexical (word level) ambiguity -- different meanings of words
– Syntactic ambiguity -- different ways to parse the sentence
– Interpreting partial information -- how to interpret pronouns
– Contextual information -- context of the sentence may affect the
meaning of that sentence.
• Many input can mean the same thing.
• Interaction among components of the input is not clear.
12
Ambiguity (1)
13
Ambiguity
14
Knowledge of Language
• Phonology – concerns how words are related to the sounds that
realize them.
15
Phonology
• Red and Read
• Flower and Flour
• I and Eye
• Write and Right
• Knows and Nose
• Hear and Here
• Weight and Wait
• A part and Apart
• Piece and Peace
• ate and eight
16
What is Morphology?
-ing
Noun,
Direct Case, Plural
• Information Retrieval
– goose and geese are two words referring to the same root goose
Morphemes
• Smallest meaning bearing units constituting a word
Classes of Morphology
• Inflection??
• Derivation??
Knowledge of Language (cont.)
• Pragmatics – concerns how sentences are used in different
situations and how use affects the interpretation of the sentence.
21
Ambiguity (2)
“I made her duck”
1. I cooked waterfowl for her benefit (to eat)
2. I cooked waterfowl belonging to her
3. I created the duck she owns
4. I caused her to quickly lower her head or body
5. I used magic and turned her into a duck.
• duck – morphologically and syntactically ambiguous:
noun or verb.
• her – syntactically ambiguous: dative or possessive.
• make – semantically ambiguous: cook or create.
• make – syntactically ambiguous:
– Transitive – takes a direct object.
– Di-transitive – takes two objects.
– Takes a direct object and a verb.
22
Ambiguity is Pervasive
• Phonetics
– I mate or duck
– I’m eight or duck
– Eye maid; her duck
– Aye mate, her duck
– I maid her duck Sound like
“I made her duck”
– I’m aid her duck
– I mate her duck
– I’m ate her duck
– I’m ate or duck
– I mate or duck
23
• Lexical category (part-of-speech)
– “duck” as a noun or a verb
• Lexical Semantics (word meaning)
– “duck” as an animal or a plaster duck statue
• Compound nouns
– e.g. “dog food”, “Intelligent design scores …”
• Syntactic ambiguity
25
Topics: Linguistics
• Word-level processing
• Syntactic processing
• Lexical and compositional semantics
• Discourse structure
26
Natural Language Understanding
Words
Morphological Analysis
Morphologically analyzed words (another step: POS tagging)
Syntactic Analysis
Syntactic Structure
Semantic Analysis
Context-independent meaning representation
Discourse Processing
Final meaning representation
27
Natural Language Generation
Meaning representation
Utterance Planning
Meaning representations for sentences
Sentence Planning and Lexical Choice
Syntactic structures of sentences with lexical choices
Sentence Generation
Morphologically analyzed words
Morphological Generation
Words
28
Different Levels of Linguistic Analysis
• Phonology
– Speech audio signal to phonemes
• Morphology
– Inflection (e.g. “I”, “my”, “me”; “eat”, “eats”, “ate”, “eaten”)
– Derivation (e.g. “teach”, “teacher”, “nominate”, “nominee”)
• Syntax
– Part-of-speech (noun, verb, adjective, preposition, etc.)
– Phrase structure (e.g. noun phrase, verb phrase)
• Semantics
– Meaning of a word (e.g. “book” as a bound volume or an
accounting ledger) or a sentence
• Discourse
– Meaning and inter-relation between sentences
29
Topics: Techniques
• Finite-state methods
• Context-free methods Supervised machine
• Probabilistic models learning methods
• Neural network models
30
31
Process Pipeline
• Phonology Each kind of knowledge has
• Morphology associated with it an encapsulated
• Syntax set of processes that make use of it.
• Semantics Interfaces are defined that allow the
• Pragmatics various levels to communicate.
• Discourse This often leads to a pipeline
architecture.
32
Dealing with Ambiguity
Four possible approaches:
1. Formal approaches -- Tightly coupled
interaction among processing levels;
knowledge from other levels can help decide
among choices at ambiguous levels.
2. Pipeline processing that ignores ambiguity as
it occurs and hopes that other levels can
eliminate incorrect structures.
3. Probabilistic approaches based on making
the most likely choices
4. Don’t do anything, maybe it won’t matter
33
Models and Algorithms
• By models we mean the formalisms that are used to
capture the various kinds of linguistic knowledge we
need.
• Algorithms are then used to manipulate the knowledge
representations needed to tackle the task at hand.
34
Various Algorithms
• In particular..
– State-space search
• To manage the problem of making choices during processing
when we lack the information needed to make the right choice
– Dynamic programming
• To avoid having to redo work during the course of a state-
space search
– CKY, Earley, Minimum Edit Distance, Viterbi, Baum-Welch
– Classifiers
• Machine learning based classifiers that are trained to make
decisions based on features extracted from the local context
35