0% found this document useful (0 votes)

2 views25 pages

Module 3 - NLP

Module 3 of AMT302 covers applications of text classification, including content organization, customer support, and sentiment analysis in e-commerce. It discusses the Naive Bayes classifier, its assumptions, and training methods, as well as logistic regression and support vector machines for classification tasks. Additionally, it introduces information extraction (IE) techniques, key phrase extraction, and named entity recognition (NER) as essential components of natural language processing.

Uploaded by

STEPHIN REJI MATHEW

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views25 pages

Module 3 - NLP

Uploaded by

STEPHIN REJI MATHEW

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

AMT302 Concepts of Natural Language Process - Module 3

Module 3

Applications of Text Classifications

● Content Classification and Organization

Task of classifying/tagging large amounts of textual data - content organization, search

engines, and recommendation systems

Examples of such data include news websites, blogs, online bookshelves, product reviews,
tweets, etc.;

Tagging product descriptions in an e-commerce website; routing customer service requests in a

company to the appropriate support team; and organizing emails into personal, social, and
promotions in Gmail are all examples of using text classification for content classification and
organization

● Customer Support

Customers often use social media to express their opinions about and experiences of products
or services. Text classification is often used to identify the tweets that brands must respond to
and those that don't require a response

● E-commerce

Customers leave reviews for a range of products on e-commerce websites like Amazon, eBay,
etc - to understand and analyze customers perception of a product or service based on their
comments. This is commonly known as “sentiment analysis.” Its used extensively by brands
across the globe to better understand customers. Rather than categorizing customer feedback
as simply positive, negative, or neutral, over a period of time, sentiment analysis has evolved
into a more sophisticated paradigm: “aspect” based sentiment analysis.

Semester 6, Department of CSE(AI & ML), SBCE

AMT302 Concepts of Natural Language Process - Module 3

● Other Applications

A Pipeline for Building Text Classification Systems

Below are the steps followed by the text classification:

1. Collect or create a labeled dataset suitable for the task.

Semester 6, Department of CSE(AI & ML), SBCE

AMT302 Concepts of Natural Language Process - Module 3

2. Split the dataset into two (training and test) or three parts: training, validation (i.e.,
development), and test sets, then decide on evaluation metric(s).

3. Transform raw text into feature vectors.

4. Train a classifier using the feature vectors and the corresponding labels from the training
set.

5. Using the evaluation metric(s) from Step 2, benchmark the model performance on the test
set.

6. Deploy the model to serve the real-world use case and monitor its performance.

What is Naive Bayes?

Naive Bayes is a probabilistic classifier, meaning that for a document d, out of

all classes c ∈ C the classifier returns the c class which has the maximum posterior

ˆ probability given the document.

The intuition of Bayesian classification is to use Bayes’ rule to transform the above equation into

other probabilities that have some useful properties.

Dropping the denominator P(d) why?⇒we calculating Pld) for each possible class and it does
not change for each class

Semester 6, Department of CSE(AI & ML), SBCE

AMT302 Concepts of Natural Language Process - Module 3

We call Naive Bayes a generative model because we can read from the above equation as
stating a kind of implicit assumption about how a document is generated: first a class is sampled
from P(c), and then the words are generated by sampling from P(d|c). (In fact we could imagine
generating artificial documents, or at least their word counts,by following this process).

The native Bayes make two assumptions

● we assume position doesn’t matter, and that the word “love” has the same effect on
classification whether it occurs as the 1st, 20th, or last word in the document. Thus we
assume that the features f1, f2,..., fn only encode word identity and not position
● The second is commonly called the naive Bayes assumption: this is the conditional
independence assumption that the probabilities P(fi|c) are independent given the class c
and hence can be ‘naively’ multiplied as follows:

Calculations for language modeling are done in log space to avoid underflow and increase
speed.

Classifiers that use a linear combination of the inputs to make a classification decision
—like naive Bayes and also logistic regression are called linear classifiers.

Semester 6, Department of CSE(AI & ML), SBCE

AMT302 Concepts of Natural Language Process - Module 3

Training the Naive Bayes

Let Nc be the number of documents in our training data with class c and

Ndoc be the total number of documents. Then

To learn the probability P(fi|c), we’ll assume a feature is just the existence of a word in the
document’s bag of words, and so we’ll want P(wi|c), which we compute as the fraction of times
the word wi appears among all words in all documents of topic c. We first concatenate all
documents with category c into one big “category c” text. Then we use the frequency of wi in
this concatenated document to give a maximum likelihood estimate of the probability:

Here the vocabulary V consists of the union of all the word types in all classes, not just the
words in one class c. There is a problem, however, with maximum likelihood training. Imagine
we are trying to estimate the likelihood of the word “fantastic” given class positive, but suppose
there are no training documents that both contain the word “fantastic” and are classified as
positive. Perhaps the word “fantastic” happens to occur (sarcastically?) in the class negative. In
such a case the probability for this feature will be zero:

we apply Laplace or add one smoothing in order to remove the zero

Semester 6, Department of CSE(AI & ML), SBCE

AMT302 Concepts of Natural Language Process - Module 3

Please note :

● vocabulary V consists of the union of all the word types in all classes, not just the
words in one class c
● Remove the unknown words from the test documents
● Sometimes we remove the stop words and sometimes we don't [ Removing the
stop words doesn't bring any change in performance, so usually so words are
included]

Semester 6, Department of CSE(AI & ML), SBCE

AMT302 Concepts of Natural Language Process - Module 3

An working example for Naive Bayes - Sentiment Analysis

Semester 6, Department of CSE(AI & ML), SBCE

AMT302 Concepts of Natural Language Process - Module 3

Optimizing for Sentiment Analysis

Binary multinomial naive Bayes or binary naive Bayes - Same algorithm but we remove all the
duplicate words from each document before combining it into big single document

A second important addition commonly made when doing text classification for sentiment is to
deal with negation. Consider the difference between I really like this movie (positive) and I didn’t
like this movie (negative). The negation expressed by didn’t completely alters the inferences we
draw from the predicate like. Similarly, negation can modify a negative word to produce a
positive review (don’t dismiss this film, doesn’t let us get bored). A very simple baseline that is
commonly used in sentiment analysis to deal with negation is the following: during text
normalization, prepend the prefix NOT to every word after a token of logical negation (n’t, not,
no, never) until the next punctuation mark. Thus the phrase didn’t like this movie , but I becomes
didn’t NOT_like NOT_this NOT_movie , but Newly formed ‘words’ like NOT like, NOT
recommend will thus occur more often in negative document and act as cues for negative
sentiment, while words like NOT bored, NOT dismiss will acquire positive associations. in some
situations we might have insufficient labeled training data to train accurate naive Bayes
classifiers using all words in the training set to estimate positive and negative sentiment. In such
cases we can instead derive the positive and negative word features from sentiment lexicons, -
Words pre annotated with positive and negative

Semester 6, Department of CSE(AI & ML), SBCE

AMT302 Concepts of Natural Language Process - Module 3

Case studies using logistics regression

What is logistic Regression?

Logistic regression is the baseline supervised machine learning algorithm for classification, and
also has a very close relationship with neural networks - a neural network can be viewed as a
series of logistic regression classifiers stacked on top of each other.

Unlike Naive Bayes, which estimates probabilities based on feature occurrence in classes,
logistic regression “learns” the weights for individual features based on how important they are
to make a classification decision. The goal of logistic regression is to learn a linear separator
between classes in the training data with the aim of maximizing the probability of the data. This
“learning” of feature weights and probability distribution over all classes is done through a
function called “logistic” function, and (hence the name) logistic regression.

The most important difference between naive Bayes and logistic regression is that logistic
regression is a discriminative classifier while naive Bayes is a generative classifier.

Example: A dog-cat classifier

A generative model would have the goal of understanding what dogs look like and what cats
look like. You might literally ask such a model to ‘generate’, i.e., draw, a dog. Given a test
image, the system then asks whether it’s the cat model or the dog model that better fits (is less
surprised by) the image, and chooses that as its label. A discriminative model, by contrast, is
only trying to learn to distinguish the classes (perhaps without learning much about them). So
maybe all the dogs in the training data are wearing collars and the cats aren’t. If that one feature
neatly separates the classes, the model is satisfied. If you ask such a model what it knows
about cats all it can say is that they don’t wear collars.

Semester 6, Department of CSE(AI & ML), SBCE

AMT302 Concepts of Natural Language Process - Module 3

We are dealing with binary logistic regression ⇒binary decision

So we are taking sigmoid function [Recall the different activation functions]

Consider a single input observation x, which we will represent by a vector of features [x1, x2,...,
xn] (we’ll show sample features in the next subsection). The classifier output y can be 1
(meaning the observation is a member of the class) or 0 (the observation is not a member of the
class). We want to know the probability

P(y = 1|x) that this observation is a member of the class. So perhaps the decision is “positive
sentiment” versus “negative sentiment”, the features represent counts of words in a document,
P(y = 1|x) is the probability that the document has positive sentiment,and P(y = 0|x) is the
probability that the document has negative sentiment.

Semester 6, Department of CSE(AI & ML), SBCE

AMT302 Concepts of Natural Language Process - Module 3

Logistic regression solves this task by learning, from a training set, a vector of weights and a
bias term. Each weight wi is a real number, and is associated with one of the input features xi.
The weight wi represents how important that input feature is to the classification decision, and
can be positive (providing evidence that the instance being classified belongs in the positive
class) or negative (providing evidence that the instance being classified belongs in the negative
class). Thus we might expect in a sentiment task the word awesome to have a high positive
weight, and abysmal to have a very negative weight. The bias term, also called the intercept, is
another real number that’s added to the weighted inputs.

To create a probability, we’ll pass z through the sigmoid function, σ(z). The sigmoid function
(named because it looks like an s) is also called the logistic function, and gives logistic
regression its name.

Semester 6, Department of CSE(AI & ML), SBCE

AMT302 Concepts of Natural Language Process - Module 3

The threshold we make the decision on is called decision boundary

Semester 6, Department of CSE(AI & ML), SBCE

AMT302 Concepts of Natural Language Process - Module 3

Working example of logistics regression: Sentiment Classification

Support vector machine

However, unlike logistic regression, it aims to look for an optimal hyperplane in a higher
dimensional space, which can separate the classes in the data by a maximum possible margin.
Further, SVMs are capable of learning even non-linear separations between classes, unlike
logistic regression. However, they may also take longer to train.

Semester 6, Department of CSE(AI & ML), SBCE

AMT302 Concepts of Natural Language Process - Module 3

Information extraction
Information extraction (IE) is a technique or a task of extracting structured information from
unstructured text. It transforms raw text (e.g., articles, emails, social media posts) into
organized data (e.g., databases, tables, or knowledge graphs) that machines can
understand and use for downstream tasks.

Application of IE

Semester 6, Department of CSE(AI & ML), SBCE

AMT302 Concepts of Natural Language Process - Module 3

IE tasks
The overarching goal of IE is to extract “knowledge” from text, and each of these tasks provides
different information to do that.

Suppose we have the below article

Identifying that the article is about “buyback” or “stock price” relates to the IE task of keyword or
keyphrase extraction (KPE).

Identifying Apple as an organization and Luca Maestri as a person comes under the IE task of
named entity recognition (NER).

Recognizing that Apple is not a fruit, but a company, and that it refers to Apple, Inc. and not
some other company with the word “apple” in its name is the IE task of named entity
disambiguation and linking.

Extracting the information that Luca Maestri is the finance chief of Apple refers to the IE task of
relation extraction.

Advanced IE tasks :

Identifying that this article is about a single event (let’s call it “Apple buys back stocks”) and
being able to link it to other articles talking about the same event over time refers to the IE task
of event extraction.

Temporal information extraction, which aims to extract information about times and dates, which
is also useful for developing calendar applications and interactive personal assistants.

Semester 6, Department of CSE(AI & ML), SBCE

AMT302 Concepts of Natural Language Process - Module 3

Many applications, such as automatically generating weather reports or flight announcements,

follow a standard template with some slots that need to be filled based on extracted data -
template filling.

IE pipeline

Key Phrase Extraction

Concerned with extracting important words and phrases that capture the gist of the text from
the given text document

Two methods to do this: supervised and unsupervised learning

Supervised learning approaches require corpora with texts and their respective keyphrases and
use engineered features or DL techniques. Creating such labeled datasets for KPE is a time-
and cost-intensive endeavor. Hence, unsupervised approaches that do not require a labeled
dataset and are largely domain agnostic are more popular for KPE. These approaches are also
more commonly used in real-world KPE applications.

Semester 6, Department of CSE(AI & ML), SBCE

AMT302 Concepts of Natural Language Process - Module 3

All the popular unsupervised KPE algorithms are based on the idea of representing the words
and phrases in a text as nodes in a weighted graph where the weight indicates the importance
of that keyphrase. Keyphrases are then identified based on how connected they are with the
rest of the graph. The top-N important nodes from the graph are then returned as keyphrases.
Important nodes are those words and phrases that are frequent enough and also well
connected to different parts of the text. The different graph-based KPE approaches differ in the
way they select potential words/phrases from the text (from a large set of possible words and
phrases in the entire text) and the way these words/phrases are scored in the graph.

Please Note : when you implement any task

● The process of extracting potential n-grams and building the graph with them is sensitive
to document length - an issue - so use the first M% and the last N% of the text
● Since each keyphrase is independently ranked, we sometimes end up seeing
overlapping keyphrases (e.g., “buy back stock” and “buy back”). One solution for this
could be to use some similarity measure (e.g., cosine similarity) between the top- ranked
keyphrases and choose the ones that are most dissimilar to one another.
● Remove unwanted Word patterns directly-like sentences starting with preposition
● Improper text extraction can affect the rest of the KPE process. - like PDFs or scanned
documents - add some post-processing to the extracted key phrases list to create a final,
meaningful list without noise.

Named entity recognition (NER)

NER refers to the IE task of identifying the entities in a document. Entities are typically names of
persons, locations, and organizations, and other specialized strings, such as money
expressions, dates, products, names/numbers of laws or articles, and so on. NER is an
important step in the pipeline of several NLP applications involving IE.

Semester 6, Department of CSE(AI & ML), SBCE

AMT302 Concepts of Natural Language Process - Module 3

A simple approach to building an NER system is to maintain a large collection of

person/organization/location names that are the most relevant to our company (e.g., names of
all clients, cities in their addresses, etc.); this is typically referred to as a gazetteer. To check
whether a given word is a named entity or not, just do a lookup in the gazetteer.

An approach that goes beyond a lookup table is rule-based NER, which can be based on a
compiled list of patterns based on word tokens and POS tags. A more practical approach to
NER is to train an ML model, which can predict the named entities in unseen text. For each
word, a decision has to be made whether or not that word is an entity, and if it is, what type of
the entity it is. - only difference here is that NER is a “sequence labeling” problem

Unlike part-of-speech tagging, where there is no segmentation problem since each word gets
one tag, the task of named entity recognition is to find and label spans of text, and is difficult
partly because of the ambiguity of segmentation. We need to decide what’s an entity and what
isn’t, and where the boundaries are. Indeed, most words in a text will not be named entities.

Another difficulty is caused by type ambiguity. The mention JFK can refer to a person, the
airport in New York, or any number of schools, bridges, and streets around the United States.

Semester 6, Department of CSE(AI & ML), SBCE

AMT302 Concepts of Natural Language Process - Module 3

The below tables are for your information only: not there in the syllabus but you should
know it for your future - Basic English

It is necessary for you to know the below terms about the English language in order to
understand the working of an NLP model. Please read it carefully.

Semester 6, Department of CSE(AI & ML), SBCE

AMT302 Concepts of Natural Language Process - Module 3

Semester 6, Department of CSE(AI & ML), SBCE

AMT302 Concepts of Natural Language Process - Module 3

Semester 6, Department of CSE(AI & ML), SBCE

AMT302 Concepts of Natural Language Process - Module 3

Ambiguity in NER
Tagging is a disambiguation task; words are ambiguous —have more than one possible
part-of-speech—and the goal is to find the correct tag for the situation. For example, book can
be a verb (book that flight) or a noun (hand me that book).

That can be a determiner (Does that flight serve dinner) or a complementizer (I thought that
your flight was earlier). The goal of POS-tagging is to resolve these ambiguities, choosing the
proper tag for the context.

Some more examples:

England (Organization) won the 2019 world cup vs The 2019 world cup happened in England
(Location).

Washington (Location) is the capital of the US vs The first president of the US was Washington
(Person).

There are many types of ambiguity faced by NER systems, some are :

★ Word Sense Disambiguation: Many words in a text can have multiple meanings or
senses. For example, the word "Apple" can refer to a fruit or the technology company.
Determining the correct sense of a word in context is essential for accurate NER.

★ Overlapping Entities: Sometimes, named entities can overlap in text, making it

challenging to identify the boundaries of individual entities. For instance, in the sentence
"I saw a man with a dog named Max," it can be unclear whether "Max" refers to the
man's name or the dog's name without further context.

★ Abbreviations and Acronyms: Abbreviations and acronyms pose challenges in NER as

they may have multiple expansions or refer to different entities. For example, "USA" can
stand for the United States of America or the University of South Alabama.

★ Proper Noun Variations: Proper nouns can have variations in their spellings or forms,
such as abbreviations, misspellings, or alternative names. This variability adds ambiguity
to the NER process. For example, "New York" can be referred to as "NY," "N.Y.," or "Big
Apple."

Semester 6, Department of CSE(AI & ML), SBCE

AMT302 Concepts of Natural Language Process - Module 3

★ Contextual Ambiguity: The same word can have different named entity categories based
on the context. For instance, the word "Java" can refer to the programming language or
the Indonesian island, and its context determines the correct entity type.

★ Homographs: Homographs are words that are spelled the same but have different
meanings. For example, "lead" can refer to a metal or the act of guiding. Determining the
correct named entity type requires considering the context.

There are many more types…

NER as Sequence labelling

NER is traditionally modeled as a sequence classification problem, where the entity prediction
for the current word also depends on the context.

For example, if the previous word was a person name, there’s a higher probability that the
current word is also a person name if it’s a noun (e.g., first and last names).

That is =>

Consider a classifier that classifies sentences in a movie review into positive/negative/neutral

categories based on their sentiment. This classifier does not (usually) take into account the
sentiment of previous (or subsequent) sentences when classifying the current sentence. In a
sequence classifier, such context is important.

A common use case for sequence labeling is POS tagging, where we need information about
the parts of speech of surrounding words to estimate the part of speech of the current word

Till now we saw that the prediction are independent of the surroundings

Conditional random fields (CRFs) - which is discriminative and Hidden Markov

models(HMM) which is generative are popular sequence classifier training algorithms

[Will teach later what CRF and HMM are]

To perform sequence classification, we need data in a format that allows us to model the
context.

The labels in the figure follow what’s known as a BIO notation: B indicates the beginning of an
entity; I, inside an entity, indicates when entities comprise more than one word; and O, other,
indicates non-entities.

Semester 6, Department of CSE(AI & ML), SBCE

AMT302 Concepts of Natural Language Process - Module 3

Basic workflow for a NER system

1. Load the dataset - Take the dataset for preparation

2. Extract the features - extract using bios notation

3. Train the classifier - train using algorithms like CRFs, HMM, etc

4. Evaluate it on a test set

Evaluation of NER

Semester 6, Department of CSE(AI & ML), SBCE

AMT302 Concepts of Natural Language Process - Module 3

Practical ner systems - Assignment 2.1

examples⇒Chatbots, record/report analysis, newspaper article tagging, social media monitoring,

document processing like resume parsing, etc

Any use case will do but it should not be similar or copied .

You are expected to give the whole pipeline- In simple words . One or two pages are more than enough
but it should be written after knowing the concepts of IE and text classification from the above notes.

PREPARED BY,

PROF. ESTHER THOMSON

DEPT OF CSE(AI & ML)
SBCE

Semester 6, Department of CSE(AI & ML), SBCE

NLP - PPT - Module 3 - Naïve Bayes, Text Classification and Sentiment
100% (1)
NLP - PPT - Module 3 - Naïve Bayes, Text Classification and Sentiment
86 pages
Naive Bayes and Sentiment Classification
No ratings yet
Naive Bayes and Sentiment Classification
23 pages
Lecture5 421
No ratings yet
Lecture5 421
115 pages
4 Naive Bayes
No ratings yet
4 Naive Bayes
82 pages
Classification
No ratings yet
Classification
81 pages
4 NB 2024
No ratings yet
4 NB 2024
82 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
48 pages
NB 24 Aug
No ratings yet
NB 24 Aug
82 pages
NLP CS (Ai)
No ratings yet
NLP CS (Ai)
103 pages
Top Machine Learning Informations About Different Algorithms
No ratings yet
Top Machine Learning Informations About Different Algorithms
63 pages
3 Classification 1
No ratings yet
3 Classification 1
55 pages
Text Classification
No ratings yet
Text Classification
60 pages
Module 3 NLP
No ratings yet
Module 3 NLP
17 pages
Slp3 TextClassification Reduced
No ratings yet
Slp3 TextClassification Reduced
60 pages
NB 24 Aug
No ratings yet
NB 24 Aug
79 pages
05 Text Classification - Naive Bayes
No ratings yet
05 Text Classification - Naive Bayes
64 pages
Week 4
No ratings yet
Week 4
45 pages
05 Text Classification - Naive Bayes
No ratings yet
05 Text Classification - Naive Bayes
64 pages
T4L1 Naive Bayes
No ratings yet
T4L1 Naive Bayes
50 pages
7 - Text Classification Naive Bayes
No ratings yet
7 - Text Classification Naive Bayes
41 pages
NB 24 Aug
No ratings yet
NB 24 Aug
85 pages
Unit 3-Notes AI
No ratings yet
Unit 3-Notes AI
36 pages
Naive Bayes With Sentiment Classification
No ratings yet
Naive Bayes With Sentiment Classification
82 pages
Lecture 6 - Word2Vec and Text Classification
No ratings yet
Lecture 6 - Word2Vec and Text Classification
66 pages
Lecture03 Naive Bayes
No ratings yet
Lecture03 Naive Bayes
33 pages
Naivebayes 2021
No ratings yet
Naivebayes 2021
77 pages
Lecture 02
No ratings yet
Lecture 02
31 pages
Sentiment Analysis Using Naïve Bayes Classifier
No ratings yet
Sentiment Analysis Using Naïve Bayes Classifier
23 pages
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper
No ratings yet
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper
74 pages
NLP ch4 l1
No ratings yet
NLP ch4 l1
23 pages
Naive Bayes Sentiment Analysis
No ratings yet
Naive Bayes Sentiment Analysis
23 pages
04 Textcat
No ratings yet
04 Textcat
101 pages
NLP NB
No ratings yet
NLP NB
52 pages
Multimedia Application L8
No ratings yet
Multimedia Application L8
68 pages
Text Classification
No ratings yet
Text Classification
24 pages
Text Classification Using TF-IDF and Machine Learning
No ratings yet
Text Classification Using TF-IDF and Machine Learning
30 pages
Multinomial NB
No ratings yet
Multinomial NB
52 pages
BAI601 Module 3 PDF
No ratings yet
BAI601 Module 3 PDF
19 pages
Multimedia Application L7 - For
No ratings yet
Multimedia Application L7 - For
46 pages
Naive Bayes
No ratings yet
Naive Bayes
56 pages
MLRD 2
No ratings yet
MLRD 2
15 pages
Lect 05
No ratings yet
Lect 05
17 pages
NLP Unit-3
No ratings yet
NLP Unit-3
17 pages
Bag - of - Words NLP
No ratings yet
Bag - of - Words NLP
23 pages
Text Classification MLND Project Report Prasann Pandya
No ratings yet
Text Classification MLND Project Report Prasann Pandya
17 pages
Int344 Unit 1,2 & 6
No ratings yet
Int344 Unit 1,2 & 6
13 pages
Unit 3
No ratings yet
Unit 3
27 pages
ML7 - Text Classification
No ratings yet
ML7 - Text Classification
13 pages
NaiveBayes N Text Analytics
No ratings yet
NaiveBayes N Text Analytics
20 pages
Naive Bayes - Text Classification and Sentiment
No ratings yet
Naive Bayes - Text Classification and Sentiment
19 pages
Naive Bayes Algorithm For Classification Tasks: Sana Badagan 1MS24RAI09
No ratings yet
Naive Bayes Algorithm For Classification Tasks: Sana Badagan 1MS24RAI09
31 pages
Learning Based Approach For Hindi Text S 77957aeb
No ratings yet
Learning Based Approach For Hindi Text S 77957aeb
8 pages
Naive Bayes and Sentiment
No ratings yet
Naive Bayes and Sentiment
19 pages
Lec # 9
No ratings yet
Lec # 9
18 pages
Sentiment Analysis in Twitter: Rohit Kumar Jha (11615) Sakaar Khurana (10627)
No ratings yet
Sentiment Analysis in Twitter: Rohit Kumar Jha (11615) Sakaar Khurana (10627)
9 pages
Text Classification
No ratings yet
Text Classification
11 pages
NLP Assignement Solution
No ratings yet
NLP Assignement Solution
6 pages
Presentation On: Neural Network
No ratings yet
Presentation On: Neural Network
30 pages
Text Classification
No ratings yet
Text Classification
7 pages
NI Vision Concepts Manual
No ratings yet
NI Vision Concepts Manual
414 pages
Na Ive Bayes Classifier
No ratings yet
Na Ive Bayes Classifier
3 pages
Mind Reading PPT Sreeja
No ratings yet
Mind Reading PPT Sreeja
18 pages
Digital Design - Morris Mano-Fifth Edition
No ratings yet
Digital Design - Morris Mano-Fifth Edition
31 pages
Data Science and Its Applications (21AD62) Lab Manual
No ratings yet
Data Science and Its Applications (21AD62) Lab Manual
26 pages
Fall Detection For Elderly People Using Machine Learning
No ratings yet
Fall Detection For Elderly People Using Machine Learning
5 pages
Synthetic ECG Generation For Data Augmentation and Transfer Learning in Arrhythmia Classification
No ratings yet
Synthetic ECG Generation For Data Augmentation and Transfer Learning in Arrhythmia Classification
23 pages
Week 8 Prev & Current Assignments
No ratings yet
Week 8 Prev & Current Assignments
28 pages
DS FML QB Bat20 PDF
No ratings yet
DS FML QB Bat20 PDF
51 pages
1 s2.0 S0273117723008049 Main
No ratings yet
1 s2.0 S0273117723008049 Main
24 pages
R18 - PG - MTEch (DS)
No ratings yet
R18 - PG - MTEch (DS)
60 pages
MCA Lateral 2017 PDF
No ratings yet
MCA Lateral 2017 PDF
53 pages
Hailu 2020
No ratings yet
Hailu 2020
11 pages
Grade 3 Data Mining: Question Text
No ratings yet
Grade 3 Data Mining: Question Text
28 pages
Software Defect Prediction Using Ensemble Learning
No ratings yet
Software Defect Prediction Using Ensemble Learning
6 pages
A Random Forest Guided Tour: Gerard - Biau@
No ratings yet
A Random Forest Guided Tour: Gerard - Biau@
41 pages
Supporting A Complex Audit Judgment Task: An Expert Network Approach
No ratings yet
Supporting A Complex Audit Judgment Task: An Expert Network Approach
23 pages
Face Detection and Recognition Final
No ratings yet
Face Detection and Recognition Final
18 pages
Bird Species Identification Using Deep Learning
No ratings yet
Bird Species Identification Using Deep Learning
74 pages
Different Paradigms of Pattern Recognition
No ratings yet
Different Paradigms of Pattern Recognition
8 pages
Data Description Toolbox DD Tools 2.0.0
No ratings yet
Data Description Toolbox DD Tools 2.0.0
47 pages
Using Machine Learning To Predict Outcomes of Accident Cases in Moroccan Courts
No ratings yet
Using Machine Learning To Predict Outcomes of Accident Cases in Moroccan Courts
6 pages
Destructive Leadership
No ratings yet
Destructive Leadership
16 pages
Gautam A. Kudale
No ratings yet
Gautam A. Kudale
6 pages
An Approach For Spam Detection in Youtube Comments Based On Supervised Learning
No ratings yet
An Approach For Spam Detection in Youtube Comments Based On Supervised Learning
10 pages
Guidelines For Classification of Listed Companies
No ratings yet
Guidelines For Classification of Listed Companies
23 pages
A Fast Machine Learning Model For ECG-Based Heartbeat Classification and Arrhythmia Detection
No ratings yet
A Fast Machine Learning Model For ECG-Based Heartbeat Classification and Arrhythmia Detection
11 pages
Nptel Week 9
No ratings yet
Nptel Week 9
4 pages
Proof: Engineering Science and Technology, An International Journal
No ratings yet
Proof: Engineering Science and Technology, An International Journal
8 pages
Invoice Classification Using Deep Features and Machine Learning Techniques
No ratings yet
Invoice Classification Using Deep Features and Machine Learning Techniques
5 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Module 3 - NLP

Uploaded by

Module 3 - NLP

Uploaded by

AMT302 Concepts of Natural Language Process - Module 3

Applications of Text Classifications

●​ Content Classification and Organization

Task of classifying/tagging large amounts of textual data - content organization, search

Tagging product descriptions in an e-commerce website; routing customer service requests in a

Semester 6, Department of CSE(AI & ML), SBCE

A Pipeline for Building Text Classification Systems

Below are the steps followed by the text classification:

1. Collect or create a labeled dataset suitable for the task.

Semester 6, Department of CSE(AI & ML), SBCE

3. Transform raw text into feature vectors.

What is Naive Bayes?

ˆ probability given the document.

other probabilities that have some useful properties.

Semester 6, Department of CSE(AI & ML), SBCE

The native Bayes make two assumptions

Semester 6, Department of CSE(AI & ML), SBCE

Training the Naive Bayes

Ndoc be the total number of documents. Then

we apply Laplace or add one smoothing in order to remove the zero

Semester 6, Department of CSE(AI & ML), SBCE

Semester 6, Department of CSE(AI & ML), SBCE

An working example for Naive Bayes - Sentiment Analysis

Semester 6, Department of CSE(AI & ML), SBCE

Optimizing for Sentiment Analysis

Semester 6, Department of CSE(AI & ML), SBCE

Case studies using logistics regression

What is logistic Regression?

Example: A dog-cat classifier

Semester 6, Department of CSE(AI & ML), SBCE

We are dealing with binary logistic regression ⇒binary decision

So we are taking sigmoid function [Recall the different activation functions]

Semester 6, Department of CSE(AI & ML), SBCE

Semester 6, Department of CSE(AI & ML), SBCE

The threshold we make the decision on is called decision boundary

Semester 6, Department of CSE(AI & ML), SBCE

Working example of logistics regression: Sentiment Classification

Support vector machine

Semester 6, Department of CSE(AI & ML), SBCE

Semester 6, Department of CSE(AI & ML), SBCE

Suppose we have the below article

Semester 6, Department of CSE(AI & ML), SBCE

Many applications, such as automatically generating weather reports or flight announcements,

Key Phrase Extraction

Two methods to do this: supervised and unsupervised learning

Semester 6, Department of CSE(AI & ML), SBCE

Please Note : when you implement any task

Named entity recognition (NER)

Semester 6, Department of CSE(AI & ML), SBCE

A simple approach to building an NER system is to maintain a large collection of

Semester 6, Department of CSE(AI & ML), SBCE

Semester 6, Department of CSE(AI & ML), SBCE

Semester 6, Department of CSE(AI & ML), SBCE

Semester 6, Department of CSE(AI & ML), SBCE

Some more examples:

★​ Overlapping Entities: Sometimes, named entities can overlap in text, making it

★​ Abbreviations and Acronyms: Abbreviations and acronyms pose challenges in NER as

Semester 6, Department of CSE(AI & ML), SBCE

There are many more types…

NER as Sequence labelling

Consider a classifier that classifies sentences in a movie review into positive/negative/neutral

Conditional random fields (CRFs) - which is discriminative and Hidden Markov

[Will teach later what CRF and HMM are]

Semester 6, Department of CSE(AI & ML), SBCE

Basic workflow for a NER system

1. Load the dataset - Take the dataset for preparation

2. Extract the features - extract using bios notation

4. Evaluate it on a test set

Semester 6, Department of CSE(AI & ML), SBCE

Practical ner systems - Assignment 2.1

examples⇒Chatbots, record/report analysis, newspaper article tagging, social media monitoring,

Any use case will do but it should not be similar or copied .

PROF. ESTHER THOMSON

Semester 6, Department of CSE(AI & ML), SBCE

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

● Content Classification and Organization

★ Overlapping Entities: Sometimes, named entities can overlap in text, making it

★ Abbreviations and Acronyms: Abbreviations and acronyms pose challenges in NER as