0% found this document useful (0 votes)

34 views

Big Data Assignment Group 7 Monalisa Kakati (2757) Sejal Gandhi (2403) Indrani Das (3890) Nitesh Deshmukh (0505) Farhan Ali (3232)

N-grams are sequences of consecutive items from a given sample of text or speech. They are used in natural language processing and text mining tasks. The document provides examples of bigrams and trigrams extracted from sentences. N-gram models predict the next item based on the previous n items, and are used for language modeling, feature extraction in machine learning, and modeling sequences in speech recognition and parsing. They are a fundamental technique for representing and analyzing sequential data like text.

Uploaded by

Aviral Lamsal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views

Big Data Assignment Group 7 Monalisa Kakati (2757) Sejal Gandhi (2403) Indrani Das (3890) Nitesh Deshmukh (0505) Farhan Ali (3232)

Uploaded by

Aviral Lamsal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

BIG DATA

Assignment

Group 7

Monalisa Kakati (2757)

Sejal Gandhi (2403)
Indrani Das (3890)
Nitesh Deshmukh(0505)
Farhan Ali (3232)
N-gram Representation

N-grams of texts are extensively used in text mining and natural language processing tasks.
They are basically a set of co-occuring words within a given window and when computing
the n-grams you typically move one word forward (although you can move X words forward
in more advanced scenarios). For example, for the sentence "The cow jumps over the moon".
If N=2 (known as bigrams), then the ngrams would be:

 the cow
 cow jumps
 jumps over
 over the
 the moon
If X=Num of words in a given sentence K, the number of n-grams for sentence K would be:

What are N-grams used for?

N-grams are used for a variety of different task. For example, when developing a
language model, n-grams are used to develop not just unigram models but also bigram
and trigram models.

Another use of n-grams is for developing features for supervised Machine Learning models
such as SVMs, MaxEnt models, Naive Bayes, etc. The idea is to use tokens such as bigrams
in the feature space instead of just unigrams.
An n-gram model models sequences, notably natural languages, using the statistical properties
of n-grams.
This idea can be traced to an experiment by Claude Shannon's work in information theory.
Shannon posed the question: given a sequence of letters (for example, the sequence "for ex"),
what is the likelihood of the next letter? From training data, one can derive a probability
distribution for the next letter given a history of size : a = 0.4, b = 0.00001, c = 0, ....; where the
probabilities of all possible "next-letters" sum to 1.0.
More concisely, an n-gram model predicts based on . In probability terms, this is . When used
for language modeling, independence assumptions are made so that each word depends only on
the last n − 1 words. This Markov model is used as an approximation of the true underlying
language. This assumption is important because it massively simplifies the problem of estimating
the language model from data. In addition, because of the open nature of language, it is common
to group words unknown to the language model together.
Note that in a simple n-gram language model, the probability of a word, conditioned on some
number of previous words (one word in a bigram model, two words in a trigram model, etc.) can
be described as following a categorical distribution (often imprecisely called a "multinomial
distribution").
In practice, the probability distributions are smoothed by assigning non-zero probabilities to
unseen words or n-grams; see smoothing techniques.
n-gram models are widely used in statistical natural language processing. In speech
recognition, phonemes and sequences of phonemes are modeled using a n-gram distribution.
For parsing, words are modeled such that each n-gram is composed of n words. For language
identification, sequences of characters/graphemes (e.g., letters of the alphabet) are modeled for
different languages.[4] For sequences of characters, the 3-grams (sometimes referred to as
"trigrams") that can be generated from "good morning" are "goo", "ood", "od ", "d m", " mo", "mor"
and so forth, counting the space character as a gram (sometimes the beginning and end of a text
are modeled explicitly, adding "__g", "_go", "ng_", and "g__"). For sequences of words, the
trigrams (shingles) that can be generated from "the dog smelled like a skunk" are "# the dog",
"the dog smelled", "dog smelled like", "smelled like a", "like a skunk" and "a skunk #".

Key Considerations in Designing Effective Training Programs
100% (2)
Key Considerations in Designing Effective Training Programs
6 pages
Formal Languages And Automata Theory
From Everand
Formal Languages And Automata Theory
Ajit Singh
No ratings yet
N Gram
No ratings yet
N Gram
6 pages
ai
No ratings yet
ai
13 pages
Unit-3 (NLP)
No ratings yet
Unit-3 (NLP)
28 pages
Report- BOUFTIRA- 201924010110
No ratings yet
Report- BOUFTIRA- 201924010110
10 pages
BCSE306L_AI_MODULE-7_SMSATAPATHY
No ratings yet
BCSE306L_AI_MODULE-7_SMSATAPATHY
51 pages
n-grams
No ratings yet
n-grams
2 pages
They Are Basically A Set of Co-Occurring Words Within A Given Window
No ratings yet
They Are Basically A Set of Co-Occurring Words Within A Given Window
2 pages
Text Generation Using Markov Chain
No ratings yet
Text Generation Using Markov Chain
13 pages
CME4408 P5 N-grams Smooting
No ratings yet
CME4408 P5 N-grams Smooting
43 pages
Natural Language Processing_Notes_Unit 2
No ratings yet
Natural Language Processing_Notes_Unit 2
12 pages
2. ngram
No ratings yet
2. ngram
41 pages
3. n grams
No ratings yet
3. n grams
3 pages
N-Gram Models For Language Detection
No ratings yet
N-Gram Models For Language Detection
14 pages
NLP Unit-4
No ratings yet
NLP Unit-4
48 pages
Class-Based N-Gram Models of Natural Language
No ratings yet
Class-Based N-Gram Models of Natural Language
14 pages
N Grams
No ratings yet
N Grams
1 page
Artificial Intelligence: N-Gram Models: Russell & Norvig: Section 22.1
No ratings yet
Artificial Intelligence: N-Gram Models: Russell & Norvig: Section 22.1
32 pages
F15 CS194 Lec 05 Natural Language
No ratings yet
F15 CS194 Lec 05 Natural Language
69 pages
Lecture 3 - Language Modelling and RNNs Part 1
No ratings yet
Lecture 3 - Language Modelling and RNNs Part 1
44 pages
NLP - N-Gram Language Model
No ratings yet
NLP - N-Gram Language Model
22 pages
F14 CS194 Lec 05 Natural Language
No ratings yet
F14 CS194 Lec 05 Natural Language
43 pages
Lecture_4_N_grams
No ratings yet
Lecture_4_N_grams
29 pages
module5_DS_ppt
No ratings yet
module5_DS_ppt
38 pages
UBC Summer School in NLP - VSP 2019 Lecture 9
No ratings yet
UBC Summer School in NLP - VSP 2019 Lecture 9
17 pages
Natural Language Processing_Notes_Unit 2.docx
No ratings yet
Natural Language Processing_Notes_Unit 2.docx
19 pages
NLP ANONYMOUS QB Ans
No ratings yet
NLP ANONYMOUS QB Ans
21 pages
Ai Unit 5
No ratings yet
Ai Unit 5
16 pages
NLP UNIT III (Part 1)
No ratings yet
NLP UNIT III (Part 1)
15 pages
NLP Cat 2
No ratings yet
NLP Cat 2
78 pages
Natural Language Processing
No ratings yet
Natural Language Processing
28 pages
module 2
No ratings yet
module 2
26 pages
08 Language Models
No ratings yet
08 Language Models
69 pages
NLP Module 2
No ratings yet
NLP Module 2
18 pages
13 Ai Cse551 NLP 1 PDF
No ratings yet
13 Ai Cse551 NLP 1 PDF
50 pages
Chapter 6-NLP
No ratings yet
Chapter 6-NLP
8 pages
6.Chapter6_LanguageModel
No ratings yet
6.Chapter6_LanguageModel
33 pages
Lecture 4
No ratings yet
Lecture 4
87 pages
Unit 5 notes final
No ratings yet
Unit 5 notes final
14 pages
Lecture04-Ngram Lang Models
No ratings yet
Lecture04-Ngram Lang Models
39 pages
Unit 2
No ratings yet
Unit 2
7 pages
Language Modeling
No ratings yet
Language Modeling
88 pages
Language Models: CS6370: Natural Language Processing
No ratings yet
Language Models: CS6370: Natural Language Processing
35 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
Session 2-3 Language Modeling
No ratings yet
Session 2-3 Language Modeling
69 pages
Exercise 2 en
No ratings yet
Exercise 2 en
3 pages
Introduction To Language Modeling Final
No ratings yet
Introduction To Language Modeling Final
69 pages
02. N-Gram Language Models
No ratings yet
02. N-Gram Language Models
15 pages
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
No ratings yet
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
28 pages
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
No ratings yet
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
28 pages
N-Gram Language Models Lecture
No ratings yet
N-Gram Language Models Lecture
59 pages
N-Gram Language Models Lecture
No ratings yet
N-Gram Language Models Lecture
56 pages
13 Ngramlm
No ratings yet
13 Ngramlm
27 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
Ngrams
100% (1)
Ngrams
22 pages
Language Modeling: Introduction To N-Grams
No ratings yet
Language Modeling: Introduction To N-Grams
88 pages
3 LM Jan 08 2021
No ratings yet
3 LM Jan 08 2021
77 pages
Lecture 2. N-Gram LMs
No ratings yet
Lecture 2. N-Gram LMs
77 pages
Lecture 4
No ratings yet
Lecture 4
37 pages
Ngram Analysis
No ratings yet
Ngram Analysis
16 pages
Department of Education: Republic of The Philippines
No ratings yet
Department of Education: Republic of The Philippines
10 pages
Attention and Memory in Deep Learning and NLP
No ratings yet
Attention and Memory in Deep Learning and NLP
8 pages
Jelito I. Rueras 4EDEE7A: Fs 106 - On Becoming A Teacher
No ratings yet
Jelito I. Rueras 4EDEE7A: Fs 106 - On Becoming A Teacher
2 pages
English Tenses - Learn English, Tenses, Grammar, Charts
No ratings yet
English Tenses - Learn English, Tenses, Grammar, Charts
5 pages
Lê Vy-Apply Teacher
No ratings yet
Lê Vy-Apply Teacher
3 pages
On The Road To Fluency: Resources For English Language Learners
No ratings yet
On The Road To Fluency: Resources For English Language Learners
20 pages
Solo Taxonomy Hand Out 2
No ratings yet
Solo Taxonomy Hand Out 2
3 pages
Unit 12 Integrated Skills
No ratings yet
Unit 12 Integrated Skills
20 pages
JURNAL L&amp S
No ratings yet
JURNAL L&amp S
13 pages
4423 12693 1 PB
No ratings yet
4423 12693 1 PB
11 pages
Lesson Plan: Susan B. Anthony
No ratings yet
Lesson Plan: Susan B. Anthony
4 pages
Resume Porfolio
No ratings yet
Resume Porfolio
1 page
3rd Year Artificial Intelligence and Data Science
No ratings yet
3rd Year Artificial Intelligence and Data Science
2 pages
Lecture 10 - Knowledge and Reasoning - 2025 - LLM (1)
No ratings yet
Lecture 10 - Knowledge and Reasoning - 2025 - LLM (1)
121 pages
Danielson Framework For Novice Teacher
No ratings yet
Danielson Framework For Novice Teacher
3 pages
Exposure and Focus On Form
No ratings yet
Exposure and Focus On Form
3 pages
Completion-Terminal Report Template
100% (3)
Completion-Terminal Report Template
5 pages
Ronymar ppt. Diction&style
No ratings yet
Ronymar ppt. Diction&style
24 pages
Sample of Reflection Essay
100% (2)
Sample of Reflection Essay
3 pages
Amara Berri
No ratings yet
Amara Berri
1 page
1920s Project Rubric - Sheet1
No ratings yet
1920s Project Rubric - Sheet1
1 page
Q1 Grade 7 HEALTH DLL Week 2
No ratings yet
Q1 Grade 7 HEALTH DLL Week 2
11 pages
Lesson 1 - A Special Type of Mixture
No ratings yet
Lesson 1 - A Special Type of Mixture
5 pages
Kra 1
No ratings yet
Kra 1
47 pages
Teaching Research For STEM Teachers
No ratings yet
Teaching Research For STEM Teachers
105 pages
Assignment 1 1 Scavenger Hunt 2
No ratings yet
Assignment 1 1 Scavenger Hunt 2
4 pages
TA - Coaching Guidebook
88% (34)
TA - Coaching Guidebook
24 pages
Burek & Losos 2014 PDF
No ratings yet
Burek & Losos 2014 PDF
9 pages
Perennial Ism
No ratings yet
Perennial Ism
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Big Data Assignment Group 7 Monalisa Kakati (2757) Sejal Gandhi (2403) Indrani Das (3890) Nitesh Deshmukh (0505) Farhan Ali (3232)

Uploaded by

Big Data Assignment Group 7 Monalisa Kakati (2757) Sejal Gandhi (2403) Indrani Das (3890) Nitesh Deshmukh (0505) Farhan Ali (3232)

Uploaded by

BIG DATA

Monalisa Kakati (2757)

What are N-grams used for?

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.