0% found this document useful (0 votes)
27 views35 pages

NLP Introduction

The document discusses natural language processing (NLP) and provides examples of common NLP tasks like sentiment analysis, topic modeling, and text generation. It also describes the typical steps involved in text processing for NLP like removing punctuation, stopwords, vectorization, and weighting text with TF-IDF.

Uploaded by

movie download
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views35 pages

NLP Introduction

The document discusses natural language processing (NLP) and provides examples of common NLP tasks like sentiment analysis, topic modeling, and text generation. It also describes the typical steps involved in text processing for NLP like removing punctuation, stopwords, vectorization, and weighting text with TF-IDF.

Uploaded by

movie download
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

NATURAL LANGUAGE PROCESSING

Jifry Issadeen
NATURAL LANGUAGE PROCESSING Introduction

• Natural language processing (NLP) is concerned with the


interactions between computers and human language,
focusses mainly on how to program computers to process
and analyze large amounts of natural language data.

• Natural Language Processing (NLP) is a field of Artificial


Intelligence and Machine Learning that gives machines the
ability to read, understand and derive meaning from human
languages.

Jifry Issadeen
NATURAL LANGUAGE PROCESSING Introduction

• Common use cases of natural language processing frequently


involve speech recognition, natural language understanding,
and natural-language generation.

• In this case, we are going to deal with the classification of text.

Jifry Issadeen
NATURAL LANGUAGE PROCESSING NLP Examples

• Sentiment Analysis

• Topic Modeling

• Text Generation

Jifry Issadeen
NATURAL LANGUAGE PROCESSING NLP Examples

• Sentiment Analysis I’m very happy with the customer


service.

The product was good but It took 2


months to deliver.

• Topic Modeling
I would definitely place another order
soon.

Despite the customer service is being


rude, delivered product was faulty.
• Text Generation
It’s the best thing happened to me ever.

Jifry Issadeen
NATURAL LANGUAGE PROCESSING NLP Examples

• Sentiment Analysis Topic Modeling is a model for


discovering the abstract "topics" that
occur in a collection of documents.
• Topic Modeling
Topic modeling is a frequently used
text-mining tool for discovery of
• Text Generation hidden semantic structures in a text
body.

Jifry Issadeen
NATURAL LANGUAGE PROCESSING NLP Examples

• Sentiment Analysis

Text generation is basically,


• Topic Modeling automatically generating natural
language texts, which is as good as
human-written text.
• Text Generation Eg – Generating blog articles.

Jifry Issadeen
NATURAL LANGUAGE PROCESSING NLP – Text Processing Steps

• Remove Punctuations from the text

• Remove Stopwords from the text

• Vectorization of the words / bag of word transformation


• Weighting and normalizing the text with TF-IDF

• Training the model

Jifry Issadeen
NATURAL LANGUAGE PROCESSING Remove Punctuations

• This involves removing Warning! If you have a


punctuations from the text heavy fever, you must
document we are dealing see a doctor.
with.
• This is just a simple process
where we extract the words
that are not punctuations.
!,. Warning If you have
a heavy fever you
must see a doctor
Jifry Issadeen
NATURAL LANGUAGE PROCESSING Remove Stopwords

• We import English Stop Warning If you have a


words from NLTK library and heavy fever you must
extract the words that are see a doctor
not Stop words.
• We removed the
punctuations first and then
the stop words.
If, you, have, Warning heavy
a, you, a fever must see
doctor
Jifry Issadeen
NATURAL LANGUAGE PROCESSING Text Vectorization

• Text Vectorization is the process of converting text into a


numerical representation.
• This process is easy to understand where each word in the text
is assigned a number.

Warning heavy
doctor fever heavy must see warning
fever must see
0 1 2 3 4 5
doctor

Jifry Issadeen
NATURAL LANGUAGE PROCESSING Weighting / Normalizing

• TF-IDF weight is a statistical measure used to evaluate how


important a word is to a document in a collection or corpus.

• The importance increases proportionally to the number of times


a word appears in the document, but is offset by the word's
frequency in the corpus.

• TF-IDF can also be used for stop-words filtering in various


subject fields, including text summarization and classification.

Jifry Issadeen
NATURAL LANGUAGE PROCESSING Weighting / Normalizing

• The purpose of TF-IDF is to highlight frequent words in a


document but not across documents.

• TF: Term Frequency measures how frequently a term


occurs in a document.

• IDF: Inverse Document Frequency measures how


important a term is.

Jifry Issadeen
NATURAL LANGUAGE PROCESSING Weighting / Normalizing

• TF: Term Frequency


TF(t) = (Number of times term t appears in a document) / (Total
number of terms in the document).

• IDF: Inverse Document Frequency


IDF(t) = log_e(Total number of documents / Number of
documents with term t in it).

Jifry Issadeen
NATURAL LANGUAGE PROCESSING Weighting / Normalizing

• Consider a document containing 100 words wherein the word


cat appears 3 times. The term frequency (i.e., tf) for cat is (3 /
100) = 0.03.

• Now, assume we have 10 million documents and the word cat


appears in one thousand of these. The inverse document
frequency (i.e., idf) is calculated as log(10,000,000 / 1,000) = 4.

• Thus, the Tf-idf weight is the product of these quantities: 0.03 *


4 = 0.12.

Jifry Issadeen
NATURAL LANGUAGE PROCESSING TF-IDF Example
TF-IDF
TF
Sentence 1 “Large room” Sent.. 1 Sent.. 2 Sent.. 3
Sentence 2 “Small room” Large 1/2

Sentence 3 “Small large room” Small 0


Room 1/2

TF(t) = (Number of times term t appears Word ‘large’


The word appears
‘small’
‘room’ 1 timeappear
doesn’t
appears 1and there
time in the
are total of11.2and
sentence words
theintotal
Hence, theterm
the first
words in the
in a document) / (Total number of terms sentence.
frequencySo,
sentence 1isisthe term frequency
02.divided
So, term foris
byfrequency
2 which for
in the document). ‘large’ in sentence 1 is 1/2.
0.
‘room’ for sentence 1 is 1/2.
Jifry Issadeen
NATURAL LANGUAGE PROCESSING TF-IDF Example
TF-IDF
TF
Sentence 1 “Large room” Sent.. 1 Sent.. 2 Sent.. 3
Sentence 2 “Small room” Large 1/2 0 1/3

Sentence 3 “Small large room” Small 0 1/2 1/3


Room 1/2 1/2 1/3

TF(t) = (Number of times term t appears


Similarly, we can find out the term
in a document) / (Total number of terms frequency for each word in sentence
in the document). 2 and sentence 3..
Jifry Issadeen
NATURAL LANGUAGE PROCESSING TF-IDF Example
TF-IDF
Sentence 1 “Large room” IDF
Sentence 2 “Small room” Word IDF
Large
Sentence 3 “Small large room”
Small
TF Room

Sent.. 1 Sent.. 2 Sent.. 3


Large 1/2 0 1/3
Small 0 1/2 1/3
Room 1/2 1/2 1/3
Find out IDF for each word..
IDF(t) = log (Total number of documents /
Number of documents with term t in it).

Jifry Issadeen
NATURAL LANGUAGE PROCESSING TF-IDF Example
TF-IDF
Sentence 1 “Large room” IDF
Sentence 2 “Small room” Word IDF
Large Log(3/2) = 0.18
Sentence 3 “Small large room”
Small
TF Room

Sent.. 1 Sent.. 2 Sent.. 3


Large 1/2 0 1/3 Thehave
We wordtotal
‘large’
of 3appears
Small 0 1/2 1/3
in 2 sentences..
sentences..
Calculate the IDF for the
Room 1/2 1/2 1/3 we divide number of sentence (3)
word ‘large’..
by the number of sentences the
IDF(t) = log (Total number of documents / word ‘large’ appear (2). Log of
Number of documents with term t in it). 3/2 is equal to 0.18.
Jifry Issadeen
NATURAL LANGUAGE PROCESSING TF-IDF Example
TF-IDF
Sentence 1 “Large room” IDF
Sentence 2 “Small room” Word IDF
Large Log(3/2) = 0.18
Sentence 3 “Small large room”
Small Log(3/2) = 0.18
TF Room

Sent.. 1 Sent.. 2 Sent.. 3


Large 1/2 0 1/3
Small 0 1/2 1/3
Room 1/2 1/2 1/3
Similarly, we calculate the
IDF(t) = log (Total number of documents / IDF for the word ‘small’..
Number of documents with term t in it).

Jifry Issadeen
NATURAL LANGUAGE PROCESSING TF-IDF Example
TF-IDF
Sentence 1 “Large room” IDF
Sentence 2 “Small room” Word IDF
Large Log(3/2) = 0.18
Sentence 3 “Small large room”
Small Log(3/2) = 0.18
TF Room Log(3/3) = 0

Sent.. 1 Sent.. 2 Sent.. 3


Large 1/2 0 1/3
Small 0 1/2 1/3
Room 1/2 1/2 1/3
And we calculate the IDF for
IDF(t) = log (Total number of documents / the word ‘room’..
Number of documents with term t in it).

Jifry Issadeen
NATURAL LANGUAGE PROCESSING TF-IDF Example
TF-IDF
Sentence 1 “Large room” IDF
Sentence 2 “Small room” Word IDF
Large Log(3/2) = 0.18
Sentence 3 “Small large room”
Small Log(3/2) = 0.18
TF Room Log(3/3) = 0

Sent.. 1 Sent.. 2 Sent.. 3


Large 1/2 0 1/3
Small 0 1/2 1/3
Room 1/2 1/2 1/3
The word ‘room’ occurred 3
IDF(t) = log (Total number of documents / times. That means it is present in
Number of documents with term t in it). each sentence...

Jifry Issadeen
NATURAL LANGUAGE PROCESSING TF-IDF Example
TF-IDF
Sentence 1 “Large room” IDF
Sentence 2 “Small room” Word IDF
Large Log(3/2) = 0.18
Sentence 3 “Small large room”
Small Log(3/2) = 0.18
TF Room Log(3/3) = 0

Sent.. 1 Sent.. 2 Sent.. 3


Large 1/2 0 1/3
If the frequency of words
Small 0 1/2 1/3
appearing is high, the IDF is low.
Room 1/2 1/2 1/3
Whereas the words with low
frequency gets more IDF.
IDF(t) = log (Total number of documents /
Number of documents with term t in it).

Jifry Issadeen
NATURAL LANGUAGE PROCESSING TF-IDF Example
TF-IDF
Sentence 1 “Large room” IDF
Sentence 2 “Small room” TF-IDF = TF * IDF.. Word IDF
Large Log(3/2) = 0.18
Sentence 3 “Small large room”
Small Log(3/2) = 0.18
TF Room Log(3/3) = 0

Sent.. 1 Sent.. 2 Sent.. 3


TF-IDF
Large 1/2 0 1/3
F1 F2 F3
Small 0 1/2 1/3
Large Small Room
Room 1/2 1/2 1/3
Sent.. 1 X
Sent.. 2
Sent.. 3

Jifry Issadeen
NATURAL LANGUAGE PROCESSING TF-IDF Example
TF-IDF
Sentence 1 “Large room” IDF
Sentence 2 “Small room” TF-IDF = TF * IDF.. Word IDF
Large Log(3/2) = 0.18
Sentence 3 “Small large room”
Small Log(3/2) = 0.18
TF Room Log(3/3) = 0

Sent.. 1 Sent.. 2 Sent.. 3


TF-IDF
Large 1/2 0 1/3
F1 F2 F3
Small 0 1/2 1/3
Large Small Room
Room 1/2 1/2 1/3
Sent.. 1 X 0.18
1/20.09
Sent.. 2 X
Sent.. 3

Jifry Issadeen
NATURAL LANGUAGE PROCESSING TF-IDF Example
TF-IDF
Sentence 1 “Large room” IDF
Sentence 2 “Small room” TF-IDF = TF * IDF.. Word IDF
Large Log(3/2) = 0.18
Sentence 3 “Small large room”
Small Log(3/2) = 0.18
TF Room Log(3/3) = 0

Sent.. 1 Sent.. 2 Sent.. 3


TF-IDF
Large 1/2 0 1/3
F1 F2 F3
Small 0 1/2 1/3
Large Small Room
Room 1/2 1/2 1/3
Sent.. 1 0.09
Sent.. 2 0 0.18
0 X
Sent.. 3 X

Jifry Issadeen
NATURAL LANGUAGE PROCESSING TF-IDF Example
TF-IDF
Sentence 1 “Large room” IDF
Sentence 2 “Small room” Word IDF
Large Log(3/2) = 0.18
Sentence 3 “Small large room”
Small Log(3/2) = 0.18
TF Room Log(3/3) = 0

Sent.. 1 Sent.. 2 Sent.. 3


TF-IDF
Large 1/2 0 1/3
F1 F2 F3
Small 0 1/2 1/3
Large Small Room
Room 1/2 1/2 1/3
Sent.. 1 0.09
Sent.. 2 0
Sent.. 3 1/30.06
X 0.18

Jifry Issadeen
NATURAL LANGUAGE PROCESSING TF-IDF Example
TF-IDF
Sentence 1 “Large room” IDF
Sentence 2 “Small room” Word IDF
Large Log(3/2) = 0.18
Sentence 3 “Small large room”
Small Log(3/2) = 0.18
TF Room Log(3/3) = 0

Sent.. 1 Sent.. 2 Sent.. 3


TF-IDF
Large 1/2 0 1/3
F1 F2 F3
Small 0 1/2 1/3
Large Small Room
Room 1/2 1/2 1/3
Sent.. 1 0.09 0
Sent.. 2 0 0.09
Sent.. 3 0.06 0.06

Jifry Issadeen
NATURAL LANGUAGE PROCESSING TF-IDF Example
TF-IDF
Sentence 1 “Large room” IDF
Sentence 2 “Small room” Word IDF
Large Log(3/2) = 0.18
Sentence 3 “Small large room”
Small Log(3/2) = 0.18
TF Room Log(3/3) = 0

Sent.. 1 Sent.. 2 Sent.. 3


TF-IDF
Large 1/2 0 1/3
F1 F2 F3
Small 0 1/2 1/3
Large Small Room
Room 1/2 1/2 1/3
Sent.. 1 0.09 0 0
Sent.. 2 0 0.09 0
Sent.. 3 0.06 0.06 0

Jifry Issadeen
NATURAL LANGUAGE PROCESSING TF-IDF
Sentence 1 “Large room”
TF IDF
Sentence 2 “Small room”
Sentence 3 “Small large room”
Sent.. 1 Sent.. 2 Sent.. 3 Word IDF
Large 1/2 0 1/3 Large Log(3/2) = 0.18
Small 0 1/2 1/3 Small Log(3/2) = 0.18
Room 1/2 1/2 1/3 Room Log(3/3) = 0

TF-IDF
With TF-IDF, the importance increases
proportionally to the number of times a F1 F2 F3
word appears in the document but is Large Small Room
offset by the frequency of the word in the Sent.. 1 0.09 0 0
corpus. Sent.. 2 0 0.09 0
Sent.. 3 0.06 0.06 0

Jifry Issadeen
NATURAL LANGUAGE PROCESSING TF-IDF
Sentence 1 “Large room”
TF IDF
Sentence 2 “Small room”
Sentence 3 “Small large room”
Sent.. 1 Sent.. 2 Sent.. 3 Word IDF
Large 1/2 0 1/3 Large Log(3/2) = 0.18
Small 0 1/2 1/3 Small Log(3/2) = 0.18
Room 1/2 1/2 1/3 Room Log(3/3) = 0

Whereas,
The TDF givesappears
less important to the
Withword ‘room’
TF/ Term Frequency, we3have times.
high TF-IDF
word
And ‘room’ as
it appears it appears sentence. and in
frequently
proportion of theinword
each‘room’.
F1 F2 F3
multiple documents.
Large Small Room
With TF-IDF, the importance increases Sent.. 1 0.09 0 0
proportionally to the number of times a
Sent.. 2 0 0.09 0
word appears in the document but is offset
by the frequency of the word in the corpus. Sent.. 3 0.06 0.06 0

Jifry Issadeen
NATURAL LANGUAGE PROCESSING TF-IDF
Sentence 1 “Large room”
TF IDF
Sentence 2 “Small room”
Sentence 3 “Small large room”
Sent.. 1 Sent.. 2 Sent.. 3 Word IDF
Large 1/2 0 1/3 Large Log(3/2) = 0.18
Small 0 1/2 1/3 Small Log(3/2) = 0.18
Room 1/2 1/2 1/3 Room Log(3/3) = 0

Similarly, TF-IDF also decreases if the


word appears frequently in multiple TF-IDF
documents. F1 F2 F3
Large Small Room
With TF-IDF, the importance increases Sent.. 1 0.09 0 0
proportionally to the number of times a
Sent.. 2 0 0.09 0
word appears in the document but is offset
by the frequency of the word in the corpus. Sent.. 3 0.06 0.06 0

Jifry Issadeen
NATURAL LANGUAGE PROCESSING TF-IDF
Sentence 1 “Large room”
TF IDF
Sentence 2 “Small room”
Sentence 3 “Small large room”
Sent.. 1 Sent.. 2 Sent.. 3 Word IDF
Large 1/2 0 1/3 Large Log(3/2) = 0.18
Small 0 1/2 1/3 Small Log(3/2) = 0.18
Room 1/2 1/2 1/3 Room Log(3/3) = 0

With TF-IDF, the importance increases


TF-IDF
proportionally to the number of times a
F1 F2 F3
word appears in the document but is
Large Small Room
offset by the frequency of the word in
Sent.. 1 0.09 0 0
the corpus.
Sent.. 2 0 0.09 0
Sent.. 3 0.06 0.06 0

Jifry Issadeen
NATURAL LANGUAGE PROCESSING Training the Model

• We use Multinomial Naïve Bayes classifier to classify the


documents.

• We will discuss Multinomial Naïve Bayes in great details,


in the next lesson.

Jifry Issadeen
NATURAL LANGUAGE PROCESSING Summary

• Remove Punctuations

• Remove Stopwords

• Perform Vectorization of the words / bag of word


transformation

• Weighting and normalizing the text with TF-IDF

• Training the model

Jifry Issadeen

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy