NLP For IAF-RS-Final-09-03-2023-new
NLP For IAF-RS-Final-09-03-2023-new
Implementation
Speech
To convey
the intent
Text
What is NLP?
Around the globe many languages are
spoken. Data is available all over the world
in the form of:
Conversations Customer information
Feedbacks Human sentiments
Messages
NLU NLG
Sentence is rejected.
Parse Trees
“The cat sat on the mat”
Pragmatics
You and your friend are sitting in your
bedroom studying, and she says, 'It's hot in
here. Can you crack open a window? ‘
19
Applications for NLP
2. Speech Recognition: Google Assistant, SIRI,
Alexa
20
Applications for NLP
3. Auto-correction: Text editor (Smart-Phones,
Tablets)
21
Applications for NLP
4. Language Correction: Grammarly
22
Applications for NLP
5. Machine Translation
23
Applications for NLP
24
Applications for NLP
5. Text Classification
25
Applications for NLP
6. Digital personal assistant
26
Applications for NLP
7. Question Answering
27
Other Applications?
Applications w.r.t IAF
Chatbots for Assistance: NLP-based chatbots can be used to
provide assistance to IAF personnel. These chatbots can answer
queries related to the various functions of the IAF, such as flight
schedules, mission objectives, and equipment usage.
Speech Recognition for Pilots: NLP-based speech recognition
technology can be used to assist pilots in communicating with air
traffic control and ground crew. This technology can help in
reducing miscommunications and improving the efficiency of
flight operations.
Sentiment Analysis for Intelligence Gathering: NLP-based
sentiment analysis can be used to analyze data from social media
and other online sources to gauge public sentiment towards the
IAF. This can help the IAF in identifying potential threats and
taking preemptive action.
Applications w.r.t IAF
Machine Translation for Multilingual Communications: The IAF operates in a
multilingual environment where personnel from different regions of India work
together. NLP-based machine translation technology can be used to translate
communications in different languages, improving the speed and accuracy of
communication.
Text Analytics for Data Mining: The IAF generates vast amounts of data, including
reports, logs, and technical documents. NLP-based text analytics can be used to extract
meaningful insights from this data, aiding decision-making processes.
Text Summarization: The IAF deals with a large amount of information on a daily
basis, and text summarization can help reduce the time and effort required to go
through all of it. By using NLP techniques, the IAF can summarize large amounts of
text quickly and efficiently, allowing them to make decisions more rapidly.
Applications w.r.t IAF
Named Entity Recognition: The IAF can use Named
Entity Recognition (NER) to extract important information
from text, such as the names of people, places, and
organizations. This can help the IAF in intelligence
gathering and analysis.
Overall, NLP has the potential to greatly enhance the
efficiency and effectiveness of the IAF's operations, making
it an invaluable tool for the organization.
Tools
NLP tools give us a better understanding of how the
language may work in specific situations:
1. NLTK - open-source NLP Tool : Natural
Language Toolkit is useful for simple text analysis.
2. Stanford Core NLP: This tool can extract all
sorts of information. It has smooth named-entity
recognition and easy mark up of terms and
phrases.
Tools
3.Apache OpenNLP: OpenNLP for all sorts of text
data analysis and sentiment analysis operations. It
is also perfect in preparing text corpora for
generators and conversational interfaces.
4.SpaCy: SpaCy is also useful in deep text
analytics and sentiment analysis.
5.AllenNLP: AllenNLP is suitable for both simple
and complex tasks. AllenNLP performs specific
duties with predicted results and enough space for
experiments.
Tools
6.GenSim:The main GenSim use cases are:
• Data analysis
• Semantic search applications
• Text generation applications (chatbot, service
customization, text summarization, etc.)
7.TextBlob Library: TextBlob also provides tools for
sentiment analysis, event extraction, and intent analysis
features. TextBlob has different flexible models for sentiment
analysis. Thus, you can build entire timelines of sentiments
and look at things in progress.
Tools
8.Intel NLP Architect:NLP Architect is the
most advanced tool being one step further,
getting deeper into the sets of text data for
more business insights.
NLP Pipeline?
1. Segmentation
2. Tokenization
3. Stemming
4. Lemmatization
5. POS (Parts Of Speech) Tagging
6. Named Entity Recognition (NER)
Segmentation
• Sentence Segment is the first step for
building the NLP pipeline.
• It breaks the paragraph into separate
sentences.
Segmentation Example
Independence Day is one of the important festivals for every
Indian citizen. It is celebrated on the 15th of August each year ever
since India got independence from the British rule. The day
celebrates independence in the true sense.
Sentence Segment produces the following result:
1. "Independence Day is one of the important festivals for every
Indian citizen."
2. "It is celebrated on the 15th of August each year ever since India
got independence from the British rule."
3. "This day celebrates independence in the true sense."
Segmentation
• Text Segmentation is the task of splitting text into
meaningful segments.
• When you have the paragraph(s) to approach,
the best way to proceed is to go with one
sentence at a time.
• It reduces the complexity and simplifies the
process, even gets you the most accurate
results.
• PunktSentenceTokenizer library can be used.
Segmentation:(Python code)
# Import the nltk library for NLP processes
import nltk
# PUNKT is an unsupervised trainable model, which means it can be trained on unlabeled data
nltk.download('punkt')
# Variable that stores the whole paragraph
print ("PARAGRAPH:\n")
text = “paragraph"
print(text)
print("\n")
# Tokenize paragraph into sentences.
sentences = nltk.sent_tokenize(text)
# Print out sentences
for sentence in sentences:
print(sentence)
Segmentation:(Python code)
Input:
Air Chief Marshal VR Chaudhari PVSM AVSM VM ADC was commissione
d into the fighter stream of the Indian Air Force on 29 Dec 1982. He is an
alumnus of the National Defence Academy, Flying Instructors’ School and
Defence Services Staff College, Wellington.
Output:
Air Chief Marshal VR Chaudhari PVSM AVSM VM ADC was
commissioned into the fighter stream of the Indian Air Force on 29 Dec
1982.
He is an alumnus of the National Defence Academy, Flying Instructors’
School and Defence Services Staff College, Wellington.
Tokenization
Tokenization is the process of breaking a
phrase, sentence, paragraph, or entire
documents into the smallest unit, such as
individual words or terms. And each of these
small units is known as tokens.
Tokenization Example
Word Tokenizer is used to break the sentence
into separate words or tokens.
Example:
JavaTpoint offers Corporate Training,
Summer Training, Online Training, and Winter
Training.
Word Tokenizer generates the following result:
"JavaTpoint", "offers", "Corporate", "Training",
"Summer", "Training", "Online", "Training",
"and", "Winter", "Training", "."
Tokenization:(Python code)
• sentence_data = "Tokenization is breaking t
he raw text into small chunks. Tokenization
breaks the raw text into words, sentences c
alled tokens."
• nltk_tokens = nltk.sent_tokenize(sentence_
data)
• print("TOKENS:")
• print (nltk_tokens)
Tokenization:(Python code)
Input:
Tokenization is breaking the raw text into small chunks. Tokenization brea
ks the raw text into words, sentences called tokens.
Output:
['Tokenization is breaking the raw text into small chunks.', 'Tokenization
breaks the raw text into words, sentences called tokens.']
Stemming
Stemming is used to normalize words into its
base form or root form.
For example, celebrates, celebrated and celebrating, all
these words are originated with a single root word
"celebrate."
The big problem with stemming is that sometimes it produces
the root word which may not have any meaning.
For Example, intelligence, intelligent, and intelligently, all
these words are originated with a single root word "intelligen."
In English, the word "intelligen" do not have any meaning.
Stemming:(Python code)
import nltk
nltk.download('punkt')
# importing modules
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
ps = PorterStemmer()
for w in words:
print(w, " : ", ps.stem(w))
Stemming :(Python code)
Input:
“Programmers program with programming lan
guages”
Output:
Programmers : programm
program : program
with : with
programming : program
languages : languag
Lemmatization
Lemmatization is quite similar to the
Stamming. It is used to group different
inflected forms of the word, called Lemma.
The main difference between Stemming and
lemmatization is that it produces the root
word, which has a meaning.
For example: In lemmatization, the words
intelligence, intelligent, and intelligently has a
root word intelligent, which has a meaning.
Lemmatization :(Python code)
# import these modules
import nltk
nltk.download('wordnet')
import nltk
nltk.download('omw-1.4')
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
Output:
[('Everything', 'VBG'), ('money', 'NN'), ('.', '.')]
Example of POS Tagging:
ओ रुट्टी खादा ।
रुट्टी ( food) -noun, खा – verb
Named Entity Recognition (NER)
Language L1 Language L2
Translation Transliteration
Boy लड़का Gandhiji गाां धीजी
Translation
Challenges of NLP
1. Contextual words and phrases and homonyms
2. Synonyms
3. Irony and sarcasm
4. Ambiguity
5. Errors in text or speech
6. Colloquialisms and slang
7. Domain-specific language
8. Low-resource languages
9. Lack of research and development
Summary
1. NLP stands for ____ _____ _____.
2. It include the concepts of _______, ______, and _______.
3. Components of NLP are __________ and ___________.
4. _____ is a Open source tool used for NLP.
5. _____ is a process that stems or removes last few characters from
a word, often leading to incorrect meanings and spelling.
6. _____ relies on accurately determining the intended part-of-
speech and the meaning of a word based on its context.
References
• NLP basics
(https://towardsdatascience.com/basic-concepts-of-
natural-language-processing-nlp-models-and-python-
implementation-88a589ce1fc0
(https://www.guru99.com/nlp-tutorial.html)
• Python Sample Programs
(https://www.kdnuggets.com/2021/03/natural-language-
processing-pipelines-explained.html)
• To learn Python
(https://www.w3schools.com/python/python_regex.asp)
6
4
Questions?