An Introduction To Language Processing With Perl and Prolog
An Introduction To Language Processing With Perl and Prolog
An Introduction To Language Processing With Perl and Prolog
Pierre Nugues
Lund University Pierre.Nugues@cs.lth.se http://www.cs.lth.se/home/Pierre_Nugues/
Pierre Nugues
1 / 19
Spelling and grammatical checkers: MS Word Text indexing and information retrieval on the Internet: Google, Microsoft Bing, Yahoo Telephone information that understands some spoken questions: SJ (trains in Sweden) or Tellme.com in the United States Speech dictation of letters or reports: IBM ViaVoice, Windows Vista Translation: Google Translate, SYSTRAN
Pierre Nugues
2 / 19
Direct translation from spoken English to spoken Swedish in a restricted domain: SRI and SICS Voice control of domestic devices such as tape recorders: Philips or disc changers: MS Persona Conversational agents able to dialogue and to plan: TRAINS Spoken navigation in virtual worlds: Ulysse, Higgins Generation of 3D scenes from text: Carsim
Pierre Nugues
3 / 19
Linguistics Layers
Sounds Phonemes Words and morphology Syntax and functions Semantics Dialogue
Pierre Nugues
4 / 19
Serious
Pierre Nugues
5 / 19
The big cat ate the gray mouse The /article big /adjective cat /noun ate /verb the /article gray /adjective mouse /noun Le /article gros /adjectif chat /nom mange /verbe la/article souris /nom grise /adjectif Die /Artikel groe /Adjektiv Katze /Substantiv it /Verb die /Artikel graue /Adjektiv Maus /Substantiv
Pierre Nugues
6 / 19
Morphology
Root form to work + verb + preterit travailler + verb + past participle arbeiten + verb + past participle
Pierre Nugues
7 / 19
Syntactic Tree
sentence
noun phrase
verb phrase
article
noun
verb
noun phrase
article
noun
The
boy
hit
the
ball
Pierre Nugues
8 / 19
Pierre Nugues
9 / 19
Semantics
As opposed to syntax:
1 2
Colorless green ideas sleep furiously. *Furiously sleep ideas green colorless.
Determining the logical form: Sentence Frank is writing notes Fran cois ecrit des notes Franz schreibt Notizen Logical representation writing(Frank, notes). ecrit(Fran cois, notes). schreibt(Franz, Notizen).
Pierre Nugues
10 / 19
Lexical Semantics
Word senses:
1 2 3 4 5
note (noun) short piece of writing; note (noun) a single sound at a particular level; note (noun) a piece of paper money; note (verb ) to take notice of; note (noun) of note: of importance.
Pierre Nugues
11 / 19
Reference
3. real world
referencing
referencing
Pierre Nugues
12 / 19
Ambiguity
Many analyses are ambiguous. It makes language processing dicult. Ambiguity occurs in any layer: speech recognition, part-of-speech tagging, parsing, etc. Example of an ambiguous phonetic transcription: The boys eat the sandwiches That may correspond to: The boy seat the sandwiches ; the boy seat this and which is ; the buoys eat the sand which is
Pierre Nugues
13 / 19
Linguistics has produced an impressive set of theories and models Language processing requires signicant resources Models and tools have matured. Resources are available. Tools involve notably nite-state automata, regular expressions, rewriting rules, logic, statistics and machine learning.
Pierre Nugues
14 / 19
= NLP engine
Pierre Nugues
15 / 19
Pierre Nugues
16 / 19
Pierre Nugues
17 / 19
Pierre Nugues
18 / 19
Research Relevance
Large companies like Microsoft, Google, Yahoo, IBM, or Xerox have a research activity in natural language processing. The 7th European framework program (2007-2013) names six technology pillars in information technologies. Two of them are related to language processing: Knowledge, cognitive and learning systems: semantic systems; capturing and exploiting knowledge embedded in web and multimedia content; bio-inspired articial systems that perceive, understand, learn and evolve, and act autonomously; learning by convivial machines and humans based on a better understanding of human cognition. Simulation, visualization, interaction and mixed realities: tools for innovative design and creativity in products, services and digital media, and for natural, language-enabled and context-rich interaction and communication.
Pierre Nugues
19 / 19