0% found this document useful (0 votes)

21 views

Grapheme:: Morpheme

Uploaded by

MADHUSUDAN Chauhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

Grapheme:: Morpheme

Uploaded by

MADHUSUDAN Chauhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 20

Grapheme:

A grapheme is a letter or group of letters that represents a single phoneme so it is the

smallest unit of written language whether or not it carries meaning or corresponds to a
single phenomenon. For e.g. /ā/ round is a phoneme that can be represented by various
graphemes including ai, ay, ey, ei, eight. Similarly, a single graphene ea represent different
phonemes (/e/, /a/)
So, it us normally considered that there are 250 graphemes

A morpheme is the smallest unit of meaning that cannot be further divided so a base
wind might be a morpheme but a suffix, prefix or root also represents a morpheme. Fr e.g.
the word red is a single morpheme, but the word unpredictable is made up of the
morphemes un + pre + dict + able while none of these units stand alone as words, but they
are the smallest unit of meaning

06/09/2024

Understanding Linguistic:
Linguistic is the study of language, its structure and the rules that govern its
structure. It follows the approach which include morphology, syntax, semantics, pragmatics.

What is morphology?
It is the study of word structure. It describes how words are formed out of more
basic elements of language called morphemes. A morpheme is the smallest unit of a
language. Morphemes are considered minimal because if they were subdivided any further,
they would become meaningless.
Each morpheme is different from other because each carries a different meaning.
Morphemes are used to form words Base, root or free morpheme are words that have
meaning and cannot be broken down into smaller parts. Examples of free morphemes are
ocean, book, colour, connect etc.
These words cannot be broken down into smaller units.
Bound or grammatical morpheme, which cannot convey meaning by themselves
must be formed with free morphemes.
Examples:
ocean-s
establish-ment
book-ed
color-ful
dis-connect

Bound morphemes often include the following -ing, -s, -ed etc.

09/09/2024

Semantic Analysis: (literal meaning) Semantic analysis is very important component of

NLP that concentrates on understanding on the meaning, interpolation and relationship
between words, phrases and sentences. Tools built on semantic analysis can assist business
in automatically extracting useful information from data including emails, support request
and consumer comments.

Advantages of semantic analysis:

1. Improves understanding of text: AI helps understands the tone meaning of words, phrase
and sentences leading to a more accurate interpretation of text.
2. Enhanced search of Information retrieval: Search Engine can provide more relevant result
by understanding user queries better considering the context of meaning rather
than just keywords.
3. Improved machine learning model: In AI and machine learning semantic analysis helps in
deadline extraction, sentiment analysis and understanding relationship in data which
enhances the performances of model.
4. Enhanced user experience Chatbots, virtual assistant and recommendation system benefit
from semantic analysis by providing more accurate and context aware response.
5. Personalization and Recommendation system Semantic analysis allows for a deeper
understanding of user preferences enabling personalized recommendation in e-commerce . `

Pragmatic Analysis:
Pragmatic in NLP is the study of contextual meaning. It examines cases where a person’s
statement has one literal and another more profound meaning. It tells us how different
context can change the meaning of a sentence. Pragmatic considers the intention of the
speaker and writer.
E.g.:
1. Can you pull the car over?
Actual meaning: Are you capable of pulling the car?
Pragmatic meaning: This means “Can you stop the car”
2. It’s hot in here! Can you crack a window?
Here the speaker wants the windows to be opened a little and does not want the
windows to be physically damaged.
3. What time do you call this?
It meanffs why are you late and not that the speaker wants to know the time.

10/09/2024

Processing Text
1. Use of regular expression: A regular expression is helpful in deriving words given in a
sentence or paragraph. There are various rules used in regular expression. So
basically, regular expression can be used to check if a string contains the specified
search pattern.

Character Description Example

[] A set of characters [a – d]

\ It finds special sequences \d

. Any character (except g.a

newline)
^ Starts with “^The”

$ Ends with “school$”

* Zero or more occurrence “he.*o”

+ One or more occurrence “he.+o”

? Zero or one occurrence r.?m

{} Exactly specified no. of He.{2}o
occurrences

Special Sequences:

Character Description Example

\A Returns a match if the “\AThe”
specified characters are at
the beginning of string
\b Check the specified r”\bain”
character are at the
beginning or end of word
\t Returns a match whose
string contains digits
\D Returns a match whose
string does not contain
digits
\s Returns a match whose
string contains white space
character
\w Returns a match where the
word contains any word
character

Tokenization:
Tokenization in NLP is a technique that involves dividing a sentence or phrase into smaller
units known as tokens. These token consist of words, punctuation marks etc.
Types of tokenization:
1. Sentence Tokenization: The text is segmented into sentences during sentence
tokenization. This is useful for tasks that require included sentence analysis.
For e.g.: “GLA is best university. It is located in Mathura.”
After sentence tokenization “GLA is best university”
“It is located in Mathura”
2. Word Tokenization: It divides the text into individual words.
For e.g.: “Tokenization is an important NLP task.”
After word tokenization we will get the result as
[“Tokenization”, “is”, “an”, “important”, “NLP”, “task”]

3. Character Tokenization: This process divides the text into individual characters. This
can be useful for modelling character level language.
For e.g.: “Tokenization”
[“T”, “o”, “k”, “e”, “n”, “i”, “z”, “a”, “t”, “i”, “o”, “n”]

Need of Tokenization:
1. Effective text processing: Tokenization reduces the size of raw text so that it can be
handled easily
2. Language Modelling: It facilitates the creation of organized representation of
language which is useful for tasks like text generation.
3. Information retrieval: It is essential for indexing and searching in systems that store
and retrieve information.
4. Text Analysis: Tokenization provide us the facility of sentiment analysis or named
entity recognition.

Stemming:
Stemming is a process through which base or root word are extracted. In this process the
last few characters of a given word (derived words) are removed. Sometimes stemming
provides a base word which do not have proper meaning.
For e.g.: history is converted into histori
Stemming is used mostly for the purpose of sentiment analysis.
Positive sentiment  Pizza was very delicious.
Negative sentiment  Burger was very bad.
Neutral sentiment  I ordered Pizza today.

Lemmatization:
Lemmatization is the process of getting meaningful words from the given text.
Lemmatization uses corpus for stop words and WordNet corpus to produce lemma.
Moreover parts-of-speech also had to be defined to obtain correct lemma. Lemma is an
actual language word.
Lemmatization is used to find base words from derived words but it will have a meaning.
Lemmatization is basically used in chatbots.

13/09/2024

N-Grams:
N-Grams are contiguous sequence of ‘n’ items, typically words in the context of NLP. These
items can be characters, words or even syllables depending on the granularity derived. The
value of ‘n’ determines the order of the N-grams.
Example:
1. Unigram (1-gram): Single words e.g. “cat”, “dog”
2. Bigram (2-gram): Pairs of consecutive words e.g. “natural language”, “deep learning”
3. Trigrams (3-gram): Triplets of consecutive words e.g.: “machine learning models”,
“data science approach”
Similarly, 4-gram, 5-gram are sequences of four, five words etc.
N = 1 This is a sentence
Groups are [This, is, a, sentence]
N = 2 This is a sentence
Groups are [This is, is a, a sentence]
N = 3 This is a sentence
Groups are [This is a, is a sentence]

Significance of N-gram in NLP:

1. Capturing context and semantics:
N gram help capture the contextual information and semantics within a
sequence of words.
2. Improving language model:
In language modelling task, N-grams contribute to building more accurate and
context accurate model.
3. Enhanced Text Prediction:
N-gram is essential for predicting text application i.e. prediction of next word
of sequence of words.
4. Information Retrieval:
In information retrieval tasks, N-grams assist in matching and rankings
document based on the relevance of N-gram pattern.
5. Feature Extraction:
N-gram acts as a powerful feature in text classification and sentiment
analysis.

Applications of N-gram:
1. Speech Recognition
2. Machine Translation
3. Predictive Text Input
4. Named Entity Recognition

17/09/2024

Bag of Words
The bag of words model is a simple way to convert words to numerical representation in
natural language processing. This model is a simple document embedding technique based
on word frequency. Conceptually, we think of the whole document as a bag of words, rather
than a sequence. We represent the document simply by the frequency of each word. So
basically, this method converts text into vector based on the frequency of words in the text,
without considering the order or context of the words.
E.g.:
cute dog
cute cute cat

cute dog cat

cute dog 1 1 0
cute cute cat 2 0 1

TF-IDF
The TF-IDF method identifies very common words or rare words in a given document. That
means it tells us the importance of each word in the given text which is not provided by bag
of words method. The term TF stands for term frequency and the IDF stands for Inverse
Document Frequency.
Term frequency refer to the frequency of a word in a document. For a specified word it is
defined as the ratio of the number of times of word appears in a document to the total
number of words in the document.

TF (t, d) = (total number of words∈d )

( number of t appears∈d )

Where t is the word or token and d is the document.

Inverse Document Frequency measures the importance of the word in the corpus. It
measures how common a particular word is across all the document in the corpus.

IDF (t) = log ⁡¿

The TF-IDF score for a term in a document is calculated by multiplying its TF and IDF values.
This score reflects how important the term is within the context of the document and across
the entire corpus. Terms with higher TF-IDF scores are considered more significant.

TF-IDF (t, d) = TF (t, d) x IDF (t)

Let’s take an example

Suppose there are three documents
D1: The cat is on the mat
D2: My dog and cat are the best
D3: The locals are playing
Suppose there is a query
Q: The cat
Now we have to determine which document will appear first in the search result. For this we
will calculate TF-IDF.
So, let’s compute the TF scores of the words ‘the’ and ‘cat’ (i.e. the query words) with
respect to the documents
TF (“the”, D1) = 2 / 6 = 0.33
TF (“the”, D2) = 1 / 7 = 0.14
TF (“the”, D3) = 1 / 4 = 0.25
TF (“cat”, D1) = 1 / 6 = 0.16
TF (“cat”, D2) = 1 / 7 = 0.14
TF (“cat”, D3) = 0 / 4 = 0

Now we will find IDF

IDF (“the”) = log (3 / 3) = log (1) = 0
IDF (“cat”) = log (3 / 2) = log (1.5) = 0.18

So now we will calculate TF-IDF

TF-IDF (“the”, D1) = TF (“the”, D1) x IDF (“the”) = 0.33 x 0 = 0
TF-IDF (“the”, D2) = TF (“the”, D2) x IDF (“the”) = 0.14 x 0 = 0
TF-IDF (“the”, D3) = TF (“the”, D3) x IDF (“the”) = 0.25 x 0 = 0
TF-IDF (“cat”, D1) = TF (“cat”, D1) x IDF (“cat”) = 0.16 x 0.18 = 0.0293
TF-IDF (“cat”, D2) = TF (“cat”, D2) x IDF (“cat”) = 0.14 x 0.18 = 0.0251
TF-IDF (“cat”, D3) = TF (“cat”, D3) x IDF (“cat”) = 0 x 0.18 = 0

Now for the query we can use the average of TF-IDF word scores for each document to get
the ranking of D1, D2, D3 with respect to the query Q.
Average TF-IDF of D1 = (0 + 0.0293) / 2 = 0.01465
Average TF-IDF of D2 = (0 + 0.0251) / 2 = 0.01255
Average TF-IDF of D3 = (0 + 0) / 2 = 0

The Term Document Incidence Matrix:

The term document incidence matrix is one of the basic techniques to represent text data
where we get the unique words across all the documents. For each document, we add if the
term exists in the document otherwise fill 0 in the cell.
For e.g.:
Doc1: I am a cow
Doc2: Cow is what I am
Doc3: Today is Tuesday

Term Doc1 Doc2 Doc3

i 1 1 0
am 1 1 0
a 1 0 0
cow 1 1 0
is 0 1 1
what 0 1 0
today 0 0 1
tuesday 0 0 1

Q. Draw Term Document Incidence matrix.

Doc1: breakthrough drug for schizophrenia
Doc2: new schizophrenia drug
Doc3: new approach for treatment of schizophrenia
Doc4: new hopes for treatment of schizophrenia

Term Doc1 Doc2 Doc3 Doc4

breakthrough 1 0 0 0
drug 1 1 0 0
for 1 0 1 1
schizophrenia 1 1 1 1
new 0 1 1 1
approach 0 0 1 0
treatment 0 0 1 1
of 0 0 1 1
hopes 0 0 0 1

In term document incidence matrix, we can answer any query which is in the form of
Boolean expression of terms, that is in which terms are combined with the operator AND,
OR, NOT.
Q. Write the result for the query.
a) schizophrenia AND drug  1111 && 1100  1100  Doc1 and Doc2
b) for AND NOT (drug OR approach)  1011 && ~ (1100 || 0010)  1011 && ~1110 
1011 && 0001  0001  Doc4

Inverted Index
In this method, a vector is formed where each document is given a document ID and the
term act as pointers. Then sorting of the list is done in alphabetical order and pointers are
maintained to their corresponding document ID.
For example, if we have the following documents
Max lives in Texas
Jen worked in Seattle
Max met Jen in Texas

Doc ID Document Tokenized terms

1 Max lives in Texas [‘Max’, ‘lives’, ‘in’, ‘Texas’]
2 Jen worked in Seattle [‘Jen’, ‘worked’, ‘in’, ‘Seattle’]
3 Max met Jen in Texas [‘Max’, ‘met’, ‘Jen’, ‘in’, ‘Texas’]

Formation of vector
Tokenized Terms:

Doc ID
Max 1
lives 1
in 1
Texas 1
Jen 2
worked 2
in 2
Seattle 2
Max 3
Met 3
Jen 3
in 3
Texas 3

Terms arranged in alphabetical order:

in 1
in 2
in 3
Jen 2
Jen 3
lives 1
Max 1
Max 3
met 3
Seattle 2
Texas 1
Texas 3
worked 2

Inverted Index:

Term Freq Documents

in 3 1, 2, 3
Jen 2 2, 3
lives 1 1
Max 2 1, 3
met 1 3
Seattle 1 2
Texas 2 1, 3
worked 1 2

Q. Draw the inverted index for the document collection

Doc1: breakthrough drug for schizophrenia
Doc2: new schizophrenia drug
Doc3: new approach for treatment of schizophrenia
Doc4: new hopes for treatment of schizophrenia

Term Doc ID
breakthrough 1
drug 1
for 1
schizophrenia 1
new 2
schizophrenia 2
drug 2
new 3
approach 3
for 3
treatment 3
of 3
schizophrenia 3
new 4
hopes 4
for 4
treatment 4
of 4
schizophrenia 4

Term Freq Documents

approach 1 3
breakthrough 1 1
drug 2 1, 2
for 3 1, 3, 4
hopes 1 4
new 3 2, 3, 4
of 2 3, 4
schizophrenia 4 1, 2, 3, 4
treatment 2 3, 4

20/09/2024

Text Similarity:
Text Similarity is the process of comparing a piece of text with another and finding the
similarity between them. It’s basically about determining the degree of closeness of the text.
We will discuss two types of similarity.
1. Cosine similarity
2. Jaccard similarity

Cosine Similarity: Cosine Similarity measures the similarity between two

vectors. It measures the cosine of the angle between two embeddings and
determines whether they are pointing in same direction.
When the embeddings are pointing in the same direction the angle between them is
zero, so their cosine similarity is 1.
When the angle between them is 90 degrees, the cosine similarity is 0.

(∑ )
n
Ai∗B i
Cosine Similarity (A, B) =
i=1

√ √∑
n n

∑ Ai∗ 2 2
Bi
i =1 i=1

Example: Find the cosine similarity

1. Cathy loves me more than Christine loves me
2. Rex likes me more than Cathy loves me

Doc1 Doc2
Me 2 2
Rex 0 1
Cathy 1 1
Christine 1 0
Likes 0 1
Loves 2 1
More 1 1
Than 1 1

A = [2, 0, 1, 1, 0, 2, 1, 1]
B = [2, 1, 1, 0, 1, 1, 1, 1]
Cosine similarity:
n

∑ A i∗Bi=¿22 + 01 + 11 + 10 + 01 + 21 + 11 + 11 = 4 + 0 + 1 + 0 + 0 + 2 + 1 + 1=9

i=1

Root of sum of squares of A = sqrt (4 + 0 + 1 + 1 + 0 + 4 + 1 + 1) = sqrt (12) = 3.46

Root of sum of squares of B = sqrt (4 + 1 + 1 + 0 + 1 + 1 + 1 + 1) = sqrt (10) = 3.16
Cosine similarity = 9 / (3.46 * 3.16) = 0.823

Example 2:
1. I love Data science
2. I love SAP

Cosine distance = 1 – cos (θ)

Jaccard Similarity: Jaccard Similarity also called as Jaccard index or Jaccard coefficient
is a simple measure to represent the similarity between data samples. The similarity is
computed as the ratio of the length of the intersection within data samples to the length of
the data samples.
n ( A ∩ B)
J (A, B) = n( A ∪ B)

Which means common things between A and B / all things in A and B together
Let us suppose we have 2 vectors
A = [1, 3, 2]
B = [5, 0, 3]
A ∩ B = [3]

A ∪ B = [1, 3, 2, 5, 0]
J (A, B) = 1 / 5 = 0.2

Jaccard distance = 1 – Jaccard Similarity

Q. Find the Jaccard Similarity between two sets.

Set A =
Set B =
J (A, B) = 2 / 6 = 1 / 3 = 0.33

24/09/2024

Difference between Jaccard and Cosine Similarities

Jaccard Cosine

Jaccard similarity treats data as sets focusing Cosine similarity treats data as vectors in a
on the overlap of similarity of elements. It multidimensional space. It considers the
considers the presence or absence of terms, orientation as angles between vectors
ignoring the magnitudes. regardless of their magnitudes.

Jaccard similarity is calculated by finding the It is calculated by taking the dot product of
intersection of the sets by the size of their two vectors and dividing it by the product of
union. It ranges from 0 (no common) to 1 their magnitudes. It ranges from -1
(identical sets). (opposite direction) to 1 (identical
direction).

Jaccard similarity is commonly used in It is widely used in information retrieval,

application involving set comparison, recommendation system etc.
document social network analysis etc.

Q. Find Jaccard Similarity between words w1 and w2 based on bigram model.

where w1 = night, w2 = nicht
Sol.: ngram (n = 2) of ‘night’ and ‘nicht’

Ni Ig Gh Ht Ic Ch

A 1 1 1 1 0 0

B 1 0 0 1 1 1

A = [1, 1, 1, 1, 0, 0]
B = [1, 0, 0, 1, 1, 1]
|A ∩ B| = 2

|A ∪ B| = 6
J (A, B) = 2 / 6 = 0.33

Context Free Grammar:

In NLP, a Context Free Grammar, is in a set of product rules used to generate all the possible
sentence in a given language. A CFG in a formal grammar in the sense that it consists of a set
of terminals, which are the basic units of a language and a set of non-terminals which are
used generate the terminals through a set of production rules. CFG is often used in Natural
Language Parsing and generation. And also used in natural language understanding where
CFG can be used to analyze the syntactic structure of a sentence.
G = {T, N, S, R}
Here, T is terminals, N is set of non-terminals, S is starting symbol, R is rules/productions of
the form X  r whose X is a non-terminal and r is a sequence of terminals and non-terminals
(may be empty).
G is grammar that generates a language L.

Now Rules we will follow,

R {S  NP VP {there is a starting symbol which can produce Noun phrase verb phrase}
S  Aux NP VP {there is a starting symbol which can produce Auxiliary verb NP VP}
S  VP
NP  Det NOM
NOM  Noun {Nominal Noun can produce Noun}
NOM  Noun NOM
VP  Verb NP
}
T = {this, that, a, noun, flight, meal, include, read, does}
N = {S, NP, NOM, VP, Det, Noun, Verb, Aux}
S=S

Write CFG for the sentence “The man read this book”
S  NP VP
 Det NOM VP
 The NOM VP
 The Noun VP
 The man verb NP
 The man read Det NOM
 The man read this NOM
 The man read this Noun
 The man read this book

Parsing:
In NLP parsing is the process of analyzing a sentence to determine its grammatical structure
and there are two main approaches to parsing.
1.Top-Down Parsing
2.Bottom-Up Parsing
Parser generates parse tree via input tag and CFG.

1. Top-down parsing: It is a parsing technique that starts with the highest level of
grammar production rule and then works its way down to the lowest level. It begins
with the start symbol of the grammar and applies the production rules recursively to
expand it into parse tree.

Parse tree:
2. Bottom-up parsing: It is a parsing technique that starts with the sentence’s word
and works its ways up to the highest level of the grammar’s production rules. It
begins with the input sentence and applies the production rules in reverse, reducing
the input sentence to the start symbol of the grammar.

Construct parse Tree

1. Book this flight

Word 2 Vec
Word 2 Vec creates a representation of each word present in our vocabulary into a vector.
Words used in similar context of having semantic relationships are captured effectively
through their closeness in the vector space that means similar words will have similar word
vectors. Word2Vec model was created, patented and published in 2013 by a team of
researchers led by Tomáš Mikolov at Google.
Word2Vec is a shallow 2-layer neural network trained model. The input contains all the
documents/texts in our training set. For the network to process these texts, they are
represented in a 1-hot encoding of the words.
The number of neurons present in the hidden layer is equal to the length of the embedding
we want. That is if we want all our words to be vector of length 300, then the hidden layer
will contain 300 neurons.
The output layer contains probabilities for a target word (given an input to the model, what
word is expected) given a particular input.
At the end of training process, the hidden weights are treated as the word embedding.
Intuitively, this can be thought of as each word having a set of n weights. There are two
approaches we can develop these embeddings.
1. CBOW (Continuous Bag of Words)
2. Skip gram
CBOW: Continuous Bag of Words

For example, Consider the sentence

Complete Sentence  “The cake was chocolate flavoured”
Now suppose if we provide as input only
“The ____ was chocolate flavoured” then cake will become target word.

Spanish For Reading and Translation
100% (1)
Spanish For Reading and Translation
297 pages
NLP Sem Answers (All)
No ratings yet
NLP Sem Answers (All)
124 pages
Reflexive and Intensive Pronouns Handouts PDF
No ratings yet
Reflexive and Intensive Pronouns Handouts PDF
2 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
17 pages
NLP m2
No ratings yet
NLP m2
71 pages
Solution NLP UT1
No ratings yet
Solution NLP UT1
7 pages
Unit 1 NLP KCS072
No ratings yet
Unit 1 NLP KCS072
12 pages
NLP Notes
No ratings yet
NLP Notes
43 pages
NLP TT-1 Question Bank
No ratings yet
NLP TT-1 Question Bank
21 pages
Natural Language Processing
No ratings yet
Natural Language Processing
47 pages
NLP Notes
No ratings yet
NLP Notes
18 pages
AIUnit 6 10
No ratings yet
AIUnit 6 10
8 pages
NLP CT1
No ratings yet
NLP CT1
6 pages
Unit Ii NLP Notes Final
No ratings yet
Unit Ii NLP Notes Final
6 pages
NLP-Questions (1)
No ratings yet
NLP-Questions (1)
26 pages
AIML-HC Mod 04
No ratings yet
AIML-HC Mod 04
71 pages
5.2 Natural Language Processing
No ratings yet
5.2 Natural Language Processing
43 pages
NLP Practical
No ratings yet
NLP Practical
27 pages
Chapter 1
No ratings yet
Chapter 1
29 pages
NLP SEM IMP
No ratings yet
NLP SEM IMP
46 pages
UNIT 1_Part1
No ratings yet
UNIT 1_Part1
121 pages
Fundaments of Text Analysis
No ratings yet
Fundaments of Text Analysis
14 pages
Natural Language Processing
No ratings yet
Natural Language Processing
25 pages
Text Analytics and Natural Language Processing - KAI073.docx
No ratings yet
Text Analytics and Natural Language Processing - KAI073.docx
24 pages
Cog Comp - Module 2
No ratings yet
Cog Comp - Module 2
69 pages
AI_NLP
No ratings yet
AI_NLP
9 pages
unit-4 NLP
No ratings yet
unit-4 NLP
54 pages
PART B NOTES
No ratings yet
PART B NOTES
62 pages
NLP Basics
No ratings yet
NLP Basics
4 pages
NLP Steps Basic
No ratings yet
NLP Steps Basic
26 pages
CAT King study material 5
No ratings yet
CAT King study material 5
21 pages
Introduction To Natural Language Processing
No ratings yet
Introduction To Natural Language Processing
45 pages
AI Unit 3 Lecture 2
No ratings yet
AI Unit 3 Lecture 2
8 pages
NLP PYQ SOLUTIONS
No ratings yet
NLP PYQ SOLUTIONS
59 pages
Unit 6 - AI (NLP)
No ratings yet
Unit 6 - AI (NLP)
37 pages
TSP unit1 own (1)
No ratings yet
TSP unit1 own (1)
20 pages
Seminar On Natural Language Processing
No ratings yet
Seminar On Natural Language Processing
21 pages
Natural Language Processing
100% (2)
Natural Language Processing
48 pages
NLP New
No ratings yet
NLP New
3 pages
Natural Language Processing
No ratings yet
Natural Language Processing
72 pages
NLP_Notes
No ratings yet
NLP_Notes
12 pages
NLPNotes
No ratings yet
NLPNotes
12 pages
pdf NLP
No ratings yet
pdf NLP
7 pages
NLP Part1
No ratings yet
NLP Part1
67 pages
unit2
No ratings yet
unit2
20 pages
NLP-PT 1
No ratings yet
NLP-PT 1
15 pages
UNIT-1 notes
No ratings yet
UNIT-1 notes
19 pages
Extracting, Cleaning and Pre-Processing Text
No ratings yet
Extracting, Cleaning and Pre-Processing Text
12 pages
DLT Unit-5
No ratings yet
DLT Unit-5
48 pages
Week 6: Introduction To Natural Language Processing
No ratings yet
Week 6: Introduction To Natural Language Processing
18 pages
C10_AI_UNIT 3_NLP_ HALF YEARLY
No ratings yet
C10_AI_UNIT 3_NLP_ HALF YEARLY
37 pages
Natural Language Processing Questions
No ratings yet
Natural Language Processing Questions
5 pages
TSP Unit1 Own
No ratings yet
TSP Unit1 Own
13 pages
NLP_AI_X
No ratings yet
NLP_AI_X
6 pages
Introduction To Natural Language Processing and NLTK
No ratings yet
Introduction To Natural Language Processing and NLTK
23 pages
02 - Morphological Analysis
100% (1)
02 - Morphological Analysis
17 pages
Unit 7-NLP
No ratings yet
Unit 7-NLP
33 pages
Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
The Genetic Code of All Languages,(Part-1; An Overview)
From Everand
The Genetic Code of All Languages,(Part-1; An Overview)
Moni Kanchan Panda
No ratings yet
Ian Talks Regex A-Z
From Everand
Ian Talks Regex A-Z
Ian Eress
No ratings yet
Introduction to Formal Languages
From Everand
Introduction to Formal Languages
György E. Révész
2/5 (1)
The Genetic Code of All Languages; Part-7 (Korean Hangul Alphabets)
From Everand
The Genetic Code of All Languages; Part-7 (Korean Hangul Alphabets)
Moni Kanchan Panda
No ratings yet
Possessive Adjectives
No ratings yet
Possessive Adjectives
3 pages
Introduction To Programming - Exercise Sheet 7 - Solution Introduction To Programming, ITU (Autumn 2022)
No ratings yet
Introduction To Programming - Exercise Sheet 7 - Solution Introduction To Programming, ITU (Autumn 2022)
4 pages
Hortatory Text Tugas
No ratings yet
Hortatory Text Tugas
3 pages
TOS - RDA - Grade 3 English
No ratings yet
TOS - RDA - Grade 3 English
4 pages
Qnet 208 Predict
No ratings yet
Qnet 208 Predict
3 pages
Tenses in English
100% (2)
Tenses in English
76 pages
A Identifying V-WPS Office
No ratings yet
A Identifying V-WPS Office
5 pages
List of Verbs (Guardado Automaticamente)
No ratings yet
List of Verbs (Guardado Automaticamente)
7 pages
Leveling Second Term Workshop 8TH
No ratings yet
Leveling Second Term Workshop 8TH
4 pages
Advanced Grammar & Composition: Dgrigar@twu - Edu
No ratings yet
Advanced Grammar & Composition: Dgrigar@twu - Edu
7 pages
Isaiah 53:6
No ratings yet
Isaiah 53:6
6 pages
Articles Sem 1
No ratings yet
Articles Sem 1
25 pages
6.indirect - and - Direct - Questions 2
No ratings yet
6.indirect - and - Direct - Questions 2
2 pages
E. Ferfect Tense - Past Present Future
No ratings yet
E. Ferfect Tense - Past Present Future
2 pages
Numbers Galore
No ratings yet
Numbers Galore
26 pages
Part 2 - English Grammar Exercises With Answers
100% (1)
Part 2 - English Grammar Exercises With Answers
385 pages
Rubric For Creative Writing - Vision and Mission Statements
No ratings yet
Rubric For Creative Writing - Vision and Mission Statements
1 page
Grammar Worksheets
No ratings yet
Grammar Worksheets
30 pages
WS-1 -Non separable phrasal verbs (1)
No ratings yet
WS-1 -Non separable phrasal verbs (1)
2 pages
Mateo 1
No ratings yet
Mateo 1
14 pages
Meeting 6 Narrative Text
No ratings yet
Meeting 6 Narrative Text
7 pages
Future Tense in French +worksheet
No ratings yet
Future Tense in French +worksheet
4 pages
Reported 2
No ratings yet
Reported 2
2 pages
Russian Phonetic
No ratings yet
Russian Phonetic
5 pages
Class Four 2nd Sem Sheet-23
No ratings yet
Class Four 2nd Sem Sheet-23
3 pages
Thieves Cant From Dragon 66
No ratings yet
Thieves Cant From Dragon 66
10 pages
Where To Use "Is / Are / Am / Was / Were" and "Has / Have / Had"
No ratings yet
Where To Use "Is / Are / Am / Was / Were" and "Has / Have / Had"
2 pages
Masculine and Feminine Nouns
100% (1)
Masculine and Feminine Nouns
8 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Grapheme:: Morpheme

Uploaded by

Grapheme:: Morpheme

Uploaded by

Grapheme:

A grapheme is a letter or group of letters that represents a single phoneme so it is the

Semantic Analysis: (literal meaning) Semantic analysis is very important component of

Advantages of semantic analysis:

Character Description Example

\ It finds special sequences \d

. Any character (except g.a

$ Ends with “school$”

* Zero or more occurrence “he.*o”

+ One or more occurrence “he.+o”

? Zero or one occurrence r.?m

Character Description Example

Significance of N-gram in NLP:

cute dog cat

TF (t, d) = (total number of words∈d )

Where t is the word or token and d is the document.

IDF (t) = log ⁡¿

TF-IDF (t, d) = TF (t, d) x IDF (t)

Let’s take an example

Now we will find IDF

So now we will calculate TF-IDF

The Term Document Incidence Matrix:

Term Doc1 Doc2 Doc3

Q. Draw Term Document Incidence matrix.

Term Doc1 Doc2 Doc3 Doc4

Doc ID Document Tokenized terms

Terms arranged in alphabetical order:

Term Freq Documents

Q. Draw the inverted index for the document collection

Term Freq Documents

Cosine Similarity: Cosine Similarity measures the similarity between two

Example: Find the cosine similarity

∑ A i∗Bi=¿2*2 + 0*1 + 1*1 + 1*0 + 0*1 + 2*1 + 1*1 + 1*1 = 4 + 0 + 1 + 0 + 0 + 2 + 1 + 1=9

Root of sum of squares of A = sqrt (4 + 0 + 1 + 1 + 0 + 4 + 1 + 1) = sqrt (12) = 3.46

Cosine distance = 1 – cos (θ)

Jaccard distance = 1 – Jaccard Similarity

Q. Find the Jaccard Similarity between two sets.

Difference between Jaccard and Cosine Similarities

Jaccard similarity is commonly used in It is widely used in information retrieval,

Q. Find Jaccard Similarity between words w1 and w2 based on bigram model.

Context Free Grammar:

Now Rules we will follow,

Construct parse Tree

For example, Consider the sentence

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

∑ A i∗Bi=¿22 + 01 + 11 + 10 + 01 + 21 + 11 + 11 = 4 + 0 + 1 + 0 + 0 + 2 + 1 + 1=9