0% found this document useful (0 votes)
21 views63 pages

NLP For IAF-RS-Final-09-03-2023-new

Uploaded by

Vrushank Bhatt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views63 pages

NLP For IAF-RS-Final-09-03-2023-new

Uploaded by

Vrushank Bhatt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

Natural Language Processing and Its

Implementation

Dr. Rekha Sharma


Associate Professor
Department of Computer Engineering
Thakur College of Engineering and Technology
rekha.sharma@thakureducation.org
(09th March 2023)
Outline
1. What is NLP
2. Components of NLP
3. Phases of NLP
4. Applications of NLP
5. Tools for NLP
6. NLP Pipeline(Python)
7. Summary
8. References
2
What is NLP?
Natural Language?
Human to Human Communication?

Speech

To convey
the intent
Text
What is NLP?
Around the globe many languages are
spoken. Data is available all over the world
in the form of:
Conversations Customer information
Feedbacks Human sentiments
Messages

Is this data useful for computers?


It understands binary language (programming languages).
What is NLP?

Natural Language Processing refers to the


branch of Artificial intelligence that gives the
machines the ability to read, understand and
derive meaning from human languages.
What is NLP?
NLP is a field of computer science, artificial intelligence,
and computational linguistics concerned with the
interactions between computers and human (natural)
languages.
Natural Language
Understanding
(NLU)
Components of
NLP
Natural Language
Generation(NLG)
1. Natural Language Understanding
(NLU)
Natural Language Understanding (NLU) helps the machine to
understand and analyze human language by extracting the metadata
from content such as concepts, entities, keywords, emotions, relations,
and semantic roles.

NLU is mainly used in Business applications to understand the


customer's problem in both spoken and written language.

NLU involves the following tasks -

• It is used to map the given input into useful representation.

• It is used to analyze different aspects of the language.


2. Natural Language Generation
(NLG)
Natural Language Generation (NLG) acts as a
translator that converts the computerized data into
natural language representation. It mainly involves
Text planning, Sentence planning, and Text
Realization.
NLP Process
Difference between NLU and NLG

NLU NLG

NLU is the process of reading and NLG is the process of writing or


interpreting language. generating language.

It produces non-linguistic outputs It produces constructing natural


from natural language inputs. language outputs from non-linguistic
inputs.
Phases of NLP
The steps in NLP (Cont.)
Lexical Analysis: Concerns the way words are built up from smaller
meaning bearing units. (come(s),co(mes)) (DFA and RE)

Syntax: concerns how words are put together to form correct


sentences and what structural role each word has.(Grammar and
Syntax tree)

Semantics: concerns what words mean and how these meanings


combine in sentences to form sentence meanings.(search engine)

Pragmatics: concerns how sentences are used in different situations


and how use affects the interpretation of the sentence.

Discourse: concerns how the immediately preceding sentences affect


the interpretation of the next sentence.
Syntax
Example: Mumbai goes to the Rekha

Sentence is rejected.
Parse Trees
“The cat sat on the mat”
Pragmatics
You and your friend are sitting in your
bedroom studying, and she says, 'It's hot in
here. Can you crack open a window? ‘

Meaning: Break the window

Intended meaning: A window to be opened a


little
Discourse
"You always wear such lovely clothes! I'd
love to borrow something one day. “

Positive: Making the individual feel good


and positive about themselves.
Negative: making the other person feel like
they haven't been taken advantage of.
Applications for NLP
1. Chatbots: hold a conversation using text or
speech(agents)

19
Applications for NLP
2. Speech Recognition: Google Assistant, SIRI,
Alexa

20
Applications for NLP
3. Auto-correction: Text editor (Smart-Phones,
Tablets)

21
Applications for NLP
4. Language Correction: Grammarly

22
Applications for NLP
5. Machine Translation

23
Applications for NLP

6. Sentiment/Opinion Analysis: Sentiment analysis, also

referred to as opinion mining, is an approach to natural language


processing (NLP) that identifies the emotional tone behind a body of
text. This is a popular way for organizations to determine and
categorize opinions about a product, service, or idea.

24
Applications for NLP
5. Text Classification

25
Applications for NLP
6. Digital personal assistant

26
Applications for NLP
7. Question Answering

27
Other Applications?
Applications w.r.t IAF
Chatbots for Assistance: NLP-based chatbots can be used to
provide assistance to IAF personnel. These chatbots can answer
queries related to the various functions of the IAF, such as flight
schedules, mission objectives, and equipment usage.
Speech Recognition for Pilots: NLP-based speech recognition
technology can be used to assist pilots in communicating with air
traffic control and ground crew. This technology can help in
reducing miscommunications and improving the efficiency of
flight operations.
Sentiment Analysis for Intelligence Gathering: NLP-based
sentiment analysis can be used to analyze data from social media
and other online sources to gauge public sentiment towards the
IAF. This can help the IAF in identifying potential threats and
taking preemptive action.
Applications w.r.t IAF
Machine Translation for Multilingual Communications: The IAF operates in a
multilingual environment where personnel from different regions of India work
together. NLP-based machine translation technology can be used to translate
communications in different languages, improving the speed and accuracy of
communication.
Text Analytics for Data Mining: The IAF generates vast amounts of data, including
reports, logs, and technical documents. NLP-based text analytics can be used to extract
meaningful insights from this data, aiding decision-making processes.
Text Summarization: The IAF deals with a large amount of information on a daily
basis, and text summarization can help reduce the time and effort required to go
through all of it. By using NLP techniques, the IAF can summarize large amounts of
text quickly and efficiently, allowing them to make decisions more rapidly.
Applications w.r.t IAF
Named Entity Recognition: The IAF can use Named
Entity Recognition (NER) to extract important information
from text, such as the names of people, places, and
organizations. This can help the IAF in intelligence
gathering and analysis.
Overall, NLP has the potential to greatly enhance the
efficiency and effectiveness of the IAF's operations, making
it an invaluable tool for the organization.
Tools
NLP tools give us a better understanding of how the
language may work in specific situations:
1. NLTK - open-source NLP Tool : Natural
Language Toolkit is useful for simple text analysis.
2. Stanford Core NLP: This tool can extract all
sorts of information. It has smooth named-entity
recognition and easy mark up of terms and
phrases.
Tools
3.Apache OpenNLP: OpenNLP for all sorts of text
data analysis and sentiment analysis operations. It
is also perfect in preparing text corpora for
generators and conversational interfaces.
4.SpaCy: SpaCy is also useful in deep text
analytics and sentiment analysis.
5.AllenNLP: AllenNLP is suitable for both simple
and complex tasks. AllenNLP performs specific
duties with predicted results and enough space for
experiments.
Tools
6.GenSim:The main GenSim use cases are:
• Data analysis
• Semantic search applications
• Text generation applications (chatbot, service
customization, text summarization, etc.)
7.TextBlob Library: TextBlob also provides tools for
sentiment analysis, event extraction, and intent analysis
features. TextBlob has different flexible models for sentiment
analysis. Thus, you can build entire timelines of sentiments
and look at things in progress.
Tools
8.Intel NLP Architect:NLP Architect is the
most advanced tool being one step further,
getting deeper into the sets of text data for
more business insights.
NLP Pipeline?
1. Segmentation
2. Tokenization
3. Stemming
4. Lemmatization
5. POS (Parts Of Speech) Tagging
6. Named Entity Recognition (NER)
Segmentation
• Sentence Segment is the first step for
building the NLP pipeline.
• It breaks the paragraph into separate
sentences.
Segmentation Example
Independence Day is one of the important festivals for every
Indian citizen. It is celebrated on the 15th of August each year ever
since India got independence from the British rule. The day
celebrates independence in the true sense.
Sentence Segment produces the following result:
1. "Independence Day is one of the important festivals for every
Indian citizen."
2. "It is celebrated on the 15th of August each year ever since India
got independence from the British rule."
3. "This day celebrates independence in the true sense."
Segmentation
• Text Segmentation is the task of splitting text into
meaningful segments.
• When you have the paragraph(s) to approach,
the best way to proceed is to go with one
sentence at a time.
• It reduces the complexity and simplifies the
process, even gets you the most accurate
results.
• PunktSentenceTokenizer library can be used.
Segmentation:(Python code)
# Import the nltk library for NLP processes
import nltk
# PUNKT is an unsupervised trainable model, which means it can be trained on unlabeled data
nltk.download('punkt')
# Variable that stores the whole paragraph
print ("PARAGRAPH:\n")
text = “paragraph"
print(text)
print("\n")
# Tokenize paragraph into sentences.
sentences = nltk.sent_tokenize(text)
# Print out sentences
for sentence in sentences:
print(sentence)
Segmentation:(Python code)
Input:
Air Chief Marshal VR Chaudhari PVSM AVSM VM ADC was commissione
d into the fighter stream of the Indian Air Force on 29 Dec 1982. He is an
alumnus of the National Defence Academy, Flying Instructors’ School and
Defence Services Staff College, Wellington.

Output:
Air Chief Marshal VR Chaudhari PVSM AVSM VM ADC was
commissioned into the fighter stream of the Indian Air Force on 29 Dec
1982.
He is an alumnus of the National Defence Academy, Flying Instructors’
School and Defence Services Staff College, Wellington.
Tokenization
Tokenization is the process of breaking a
phrase, sentence, paragraph, or entire
documents into the smallest unit, such as
individual words or terms. And each of these
small units is known as tokens.
Tokenization Example
Word Tokenizer is used to break the sentence
into separate words or tokens.
Example:
JavaTpoint offers Corporate Training,
Summer Training, Online Training, and Winter
Training.
Word Tokenizer generates the following result:
"JavaTpoint", "offers", "Corporate", "Training",
"Summer", "Training", "Online", "Training",
"and", "Winter", "Training", "."
Tokenization:(Python code)
• sentence_data = "Tokenization is breaking t
he raw text into small chunks. Tokenization
breaks the raw text into words, sentences c
alled tokens."
• nltk_tokens = nltk.sent_tokenize(sentence_
data)
• print("TOKENS:")
• print (nltk_tokens)
Tokenization:(Python code)
Input:
Tokenization is breaking the raw text into small chunks. Tokenization brea
ks the raw text into words, sentences called tokens.

Output:
['Tokenization is breaking the raw text into small chunks.', 'Tokenization
breaks the raw text into words, sentences called tokens.']
Stemming
Stemming is used to normalize words into its
base form or root form.
For example, celebrates, celebrated and celebrating, all
these words are originated with a single root word
"celebrate."
The big problem with stemming is that sometimes it produces
the root word which may not have any meaning.
For Example, intelligence, intelligent, and intelligently, all
these words are originated with a single root word "intelligen."
In English, the word "intelligen" do not have any meaning.
Stemming:(Python code)
import nltk
nltk.download('punkt')
# importing modules
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize

ps = PorterStemmer()

sentence = "Programmers program with programming languages"


words = word_tokenize(sentence)

for w in words:
print(w, " : ", ps.stem(w))
Stemming :(Python code)
Input:
“Programmers program with programming lan
guages”
Output:
Programmers : programm
program : program
with : with
programming : program
languages : languag
Lemmatization
Lemmatization is quite similar to the
Stamming. It is used to group different
inflected forms of the word, called Lemma.
The main difference between Stemming and
lemmatization is that it produces the root
word, which has a meaning.
For example: In lemmatization, the words
intelligence, intelligent, and intelligently has a
root word intelligent, which has a meaning.
Lemmatization :(Python code)
# import these modules
import nltk
nltk.download('wordnet')
import nltk
nltk.download('omw-1.4')
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()

print("rocks :", lemmatizer.lemmatize("rocks"))


print("corpora :", lemmatizer.lemmatize("corpora"))

# a denotes adjective in "pos"


print("better :", lemmatizer.lemmatize("better", pos ="a"))
Lemmatization:(Python code)
Input:
rocks
corpora
better
Output:
rocks : rock
corpora : corpus
better : good
POS Tagging
POS Tagging is the process of marking up a
word in a text (corpus) as corresponding to a
particular part of speech, based on both its
definition and its context.
POS Tagging :(Python code)
import nltk for i in tokenized:
nltk.download('punkt') # Word tokenizers is used to find the words
# and punctuation in a string
nltk.download('stopwords')
wordsList = nltk.word_tokenize(i)
nltk.download('averaged_perceptron_tagger')
# removing stop words from wordList
from nltk.corpus import stopwords
wordsList = [w for w in wordsList if not w in
from nltk.tokenize import word_tokenize, sent_toke
stop_words]
nize
# Using a Tagger. Which is part-of-speech
stop_words = nltk.corpus.stopwords.words('english'
) # tagger or POS-tagger.
# Dummy text tagged = nltk.pos_tag(wordsList)
txt = "Everything is all about money." print(tagged)
# sent_tokenize is one of the instances of
# PunktSentenceTokenizer from the nltk.tokenize.p
unkt module
tokenized = sent_tokenize(txt)
POS Tagging:(Python code)
Input:
Everything is all about money.

Output:
[('Everything', 'VBG'), ('money', 'NN'), ('.', '.')]
Example of POS Tagging:
ओ रुट्टी खादा ।
रुट्टी ( food) -noun, खा – verb
Named Entity Recognition (NER)

Named Entity Recognition (NER) is an


information extraction task which seeks to
identify and classify elements in text which
refer to predefined categories.
Recognizable Entities
1. Person names (names of people)
2. Organization names (companies,
government organizations, committees,
etc.)
3. Location names (cities, countries etc.)
4. Miscellaneous names (Date, time,
number, percentage, monetary
expressions, number expressions and
measurement expressions)
Recognizable Entities
Organization-
What is the abbreviated expression for the
<ORG>Central Bureau of Investigation
</ORG> there is an indication(“bureau”)
that the entity is an ORGANIZATION.
Bing Translator: केन्द्रीय जाां च ब्यूरो के
Babylon: केन्द्रीय जाां च ब्यूरो के
Google: केंद्रीय अन्वेषण ब्यूरो
Access from one language to another

Language L1 Language L2

Translation Transliteration
Boy लड़का Gandhiji गाां धीजी
Translation
Challenges of NLP
1. Contextual words and phrases and homonyms
2. Synonyms
3. Irony and sarcasm
4. Ambiguity
5. Errors in text or speech
6. Colloquialisms and slang
7. Domain-specific language
8. Low-resource languages
9. Lack of research and development
Summary
1. NLP stands for ____ _____ _____.
2. It include the concepts of _______, ______, and _______.
3. Components of NLP are __________ and ___________.
4. _____ is a Open source tool used for NLP.
5. _____ is a process that stems or removes last few characters from
a word, often leading to incorrect meanings and spelling.
6. _____ relies on accurately determining the intended part-of-
speech and the meaning of a word based on its context.
References
• NLP basics
(https://towardsdatascience.com/basic-concepts-of-
natural-language-processing-nlp-models-and-python-
implementation-88a589ce1fc0
(https://www.guru99.com/nlp-tutorial.html)
• Python Sample Programs
(https://www.kdnuggets.com/2021/03/natural-language-
processing-pipelines-explained.html)
• To learn Python
(https://www.w3schools.com/python/python_regex.asp)
6
4

Questions?

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy