1 INTRODUCTION
1 INTRODUCTION
References References
• Speech and Language • Foundations of
Statistical Natural
Processing: An Language Processing
Introduction to Natural – Written by Christopher
Language Processing, Manning and H. Schutze.
Computational • This book provides an
introduction to statistical
Linguistics, and Speech methods for natural
Recognition language processing
– Written by Daniel Jurafsky
covering both the required
linguistics and the newer
and James Martin. (at the time, circa 1999)
statistical methods.
1
References Other Books
• Computational • Statistical Machine Translation
– This book provides an introduction
Approaches to to the topic of statistical machine
Morphology and Syntax translation, as subfield of NLP.
– Written by Philipp Koehn
(Oxford Surveys in Syntax
& Morphology) • Statistical Methods for Speech
– Written by Brian Roark & Recognition
– This book provides an introduction
Richard Sproat. to the topic of statistical speech
recognition, another subfield of
NLP that saw an overhaul in the
1990s with statistical approaches.
– Written by Frederick Jelinek.
2
Practical Books on NLP Practical Books on NLP
• Text Mining with R • Text Analytics with Python
– Written by Julia Silge and David – Written by Dipanjan Sarkar
Robinson.
• This book demonstrates
statistical natural language
processing methods on a range
of modern applications.
• Code examples are in R.
Language
• It is the primary means of communication
• Means to express
• Tool used for expressing our thoughts and ideas
Introduction
LANGUAGE
3
Natural language
processing
• Natural Language Processing is mainly made of
two words
– Natural Language &
– Processing
Introduction
NATURAL LANGUAGE
Understanding Natural
Introduction
Language
• Natural Language is a language which has developed
naturally in humans
What is a natural
language?
4
Understanding Natural Language -
The Human Language
The 21st Century
• Human beings are the most advance species on earth • Coming to the 21st century
and there’s no doubt in that and our success as according to industry estimates
human beings is because of our ability to – Only 21 percent of the available data
communicate and share information is present in the structured form.
That’s where – Data is being generated as we speak,
tweet and send messages on whatsapp
The concept of developing a language comes in or the various other groups of
When we talk about the human language Facebook and majority of data exists
in the textual form which is highly
It is one of the most diverse and complex part of us unstructured in nature.
considering total of 6500 languages that exists
Introduction
5
What is NLP? Text Mining
• Natural Language Processing (NLP) is the sub-field • Text mining/Text Analytics
of Artificial Intelligence that focuses on the ability of – is the process of deriving meaningful information from
a computer to understand human language natural language text.
(command) as spoken or written and to give an • Text Mining is also known as Text Data Mining
output by processing it.
Natural Language
6
Understanding Natural Language
Processing
Linguistics and Language
In other words, NLP is used to gain knowledge from the • Linguistics is the science of language
raw textual data at a disposal.
• Its study include:
– Sounds which refers to phonology
– Word formation refers to morphology
– Sentence structure refers to syntax
– Meaning refers to semantics
– Understanding refers to pragmatics
7
Natural Language Processing Natural Language Processing
• Example:
– Let say if you ask Siri what is the distance between
Earth and the Sun, it will immediately reply
8
Natural Language Processing Natural Language Processing
Before trying to understand Natural Language Processing lets look into this Check out another video
GOOGLE DUPLEX
COMPONENTS OF NLP
9
Natural Language Understanding
10
Lexical Ambiguity Syntactic Ambiguity
• Understanding a new language is very hard. • Syntactical Ambiguity is the presence of two or
• Taking our English into consideration there are lot more possible meanings within a single sentence or
of ambiguity and that too in different levels now a sequence of words, it is also called structural
• Lexical Ambiguity is the presence of two or more ambiguity or grammatical ambiguity
possible meanings within a single word, they are also
called semantic ambiguity
Referential Ambiguity
• Referential Ambiguity arise when we refer to
something using pronouns
11
Natural Language
Natural Language Generation
Generation
• Natural Language Generation deals with producing • NLG is the process of constructing natural language outputs
written or spoken language from raw data or how to from non-linguistic inputs
generate language from knowledge. • NLG can be viewed as the reverse process of NL
understanding
• A NLG system may have three main parts:
– Discourse Planner
• what will be generated. which sentences
– Surface Realizer
• realizes a sentence from its internal representation
– Lexical Selection
• selecting the correct words describing the concepts
12
Natural Language Processing Natural Language Processing
• To make things even harder, in many cases, language • So,
can be – How did we learn to attach meaning to sounds?
– Ambiguous and the meaning of a word depends on the – How do we know great [enthusiastic] means something
context it’s used in. different from great [sarcastic]?
• Example:
– If I tell you to meet at the bank (without any context) When we’re solving a natural language processing problem,
» I could mean the river bank or the place where I'm grabbing whether it’s natural language understanding or natural
some cash. language generation, we have to think about how our AI is
– If I say “This fridge is great”
going to learn the meaning of words and understand our
» That’s totally different meaning from “The fridge is great”
potential mistakes
NLU is much harder than NLG. But, still both of them are hard
13