0% found this document useful (0 votes)

2 views24 pages

Natural Language Processing Lab Manual

The document is a lab manual for Natural Language Processing (NLP) that outlines various tasks to be implemented using Python, including tokenization, stop word removal, stemming, word analysis, and word generation. It also covers Word Sense Disambiguation (WSD) using the Lesk algorithm and provides instructions for installing the NLTK toolkit. Sample code snippets and explanations for each task are included to guide users in performing NLP tasks effectively.

Uploaded by

vinodparwatham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views24 pages

Natural Language Processing Lab Manual

Uploaded by

vinodparwatham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 24

Natural Language Processing Lab Manual

1. Implement Python program to perform following tasks on text

a) Tokenization b) Stop word Removal
2. Implement Python program to implement Porter stemmer algorithm for
stemming.
3. Implement Python Program for
a) Word Analysis b) Word Generation
4. Create a Sample list for at least 5 words with ambiguous sense and
Implement WSD using Python
5. Install NLTK tool kit and perform stemming
6. Create Sample list of at least 10 words POS tagging and find the POS for any
given word
7. Write a Python program to
a) Perform Morphological Analysis using NLTK library
b) Generate n-grams using NLTK N-Grams library
c) Implement N-Grams Smoothing
8. Using NLTK package to convert audio file to text and text file to audio files.
1. Implement Python program to perform following tasks on text

b) Tokenization b) Stop word Removal

a. Python Program to Perform Tokenization:

import nltk
nltk.download('punkt') # Download necessary NLTK data files

from nltk.tokenize import word_tokenize, sent_tokenize

# Example text
text = "Natural language processing (NLP) is a field of artificial intelligence
that helps computers understand, interpret, and manipulate human language.
It enables tasks like speech recognition, sentiment analysis, and machine
translation."

# Sentence Tokenization
sentences = sent_tokenize(text)
print("Sentence Tokenization:")
print(sentences)
print()

# Word Tokenization
words = word_tokenize(text)
print("Word Tokenization:")
print(words)

Explanation:

 Sentence Tokenization: The sent_tokenize function breaks the text into

individual sentences.
 Word Tokenization: The word_tokenize function splits the text into
individual words (or tokens).

Word Tokenization: ['Hello', '!', 'My', 'name', 'is', 'Guru', '.', 'How', 'can', 'I', 'assist',
'you', 'today', '?']

Sentence Tokenization: ['Hello! My name is Guru.', 'How can I assist you today?'] n

b. Python Program to Perform Stop word Removal

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

# Download NLTK stopwords and punkt tokenizer models

nltk.download('punkt')
nltk.download('stopwords')

def remove_stopwords(text):
# Tokenize the input text into words
words = word_tokenize(text)

# Get the set of stopwords in English

stop_words = set(stopwords.words('english'))

# Remove stopwords from the tokenized words

filtered_words = [word for word in words if word.lower() not in stop_words]

# Join the filtered words back into a string

return ' '.join(filtered_words)

# Example text
text = "This is an example sentence where we will remove stop words."

# Remove stopwords
filtered_text = remove_stopwords(text)
print("Original Text:", text)
print("Filtered Text:", filtered_text)

Explanation:

 word_tokenize: Tokenizes the input text into individual words.

 stopwords.words('english'): Provides a list of common stop words in English
(like "the", "is", etc.).
 A list comprehension is used to filter out words that are found in the
stop_words list.
 The filtered_words are then joined back into a string without the stop words.

Output:

Original Text: This is an example sentence where we will remove stop words.
Filtered Text: example sentence remove stop words .

2. Implement Python program to implement Porter stemmer algorithm

for stemming.
# Stemming

import nltk

from nltk.stem import PorterStemmer

from nltk.tokenize import word_tokenize

# Download NLTK tokenizer and stemmer models

nltk.download('punkt')

def stem_text(text):

# Initialize the Porter

Stemmer porter_stemmer = PorterStemmer()

# Tokenize the text into words

words = word_tokenize(text)

# Apply stemming to each word

stemmed_words = [porter_stemmer.stem(word) for word in words]

# Join the stemmed words back into a single string

stemmed_text = ' '.join(stemmed_words)

return stemmed_text

# Example text

text = "NLTK is a leading platform for building Python programs to work with human
language data."

# Perform stemming

stemmed_text = stem_text(text)

# Print stemmed text

print(stemmed_text)

OUTPUT:

3. Implement Python Program for

b) Word Analysis b) Word Generation

 Word Analysis:
 Analyze character frequency in a given text.
 Analyze word frequency and length.

 Word Generation:

 Generate random words based on character frequency.

 Use a Markov Chain approach for more context-aware word generation.

import random

from collections import Counter, defaultdict

class WordAnalyzer:

def init(self, text):

self.text = text

self.word_list = self.text.split()

self.char_freq = self.analyze_char_frequency()

self.word_freq = self.analyze_word_frequency()

def analyze_char_frequency(self):

return Counter(self.text.replace(" ", ""))

def analyze_word_frequency(self):

return Counter(self.word_list)

def analyze_word_lengths(self):

return Counter(len(word) for word in self.word_list)

def display_analysis(self):

print("Character Frequency:")

for char, freq in self.char_freq.items():

print(f" {char}: {freq}")

print("\nWord Frequency:")

for word, freq in self.word_freq.items():

print(f" {word}: {freq}")

print("\nWord Length Frequency:")

for length, freq in self.analyze_word_lengths().items():

print(f" {length} characters: {freq}")

class WordGenerator:

def init(self, char_freq, word_list):

self.char_freq = char_freq

self.word_list = word_list

self.transition_matrix = self.build_transition_matrix()

def build_transition_matrix(self):

matrix = defaultdict(Counter)

for word in self.word_list:

for i in range(len(word) - 1):

matrix[word[i]][word[i + 1]] += 1

for char, transitions in matrix.items():

total = sum(transitions.values())

for next_char in transitions:

transitions[next_char] /= total

return matrix
def generate_word(self, length):

if not self.char_freq:

return ""

start_char = random.choice(list(self.char_freq.keys()))

word = start_char

for _ in range(length - 1):

if start_char not in self.transition_matrix:

break

next_char = random.choices(

list(self.transition_matrix[start_char].keys()),

list(self.transition_matrix[start_char].values())

)[0]

word += next_char

start_char = next_char

return word

def random_word(self, length):

return ''.join(random.choices(

list(self.char_freq.keys()),

weights=list(self.char_freq.values()),

k=length

))

if __name__ == "__main__":

# Example Text
text = "hello world this is a simple example of word analysis and generation"

# Word Analysis

analyzer = WordAnalyzer(text)

analyzer.display_analysis()

# Word Generation

generator = WordGenerator(analyzer.char_freq, analyzer.word_list)

print("\nGenerated Words:")

print("Markov Chain Based:", generator.generate_word(6))

print("Random Word:", generator.random_word(6))

1. Class: WordAnalyzer

This class is responsible for analyzing a given text. It extracts meaningful statistics
about the words and characters in the text.

Methods:

1. __init__(self, text)
o Initializes the WordAnalyzer class with a text input.
o Splits the text into words (self.word_list) and computes:
 Character frequencies: How often each character appears.
 Word frequencies: How often each word appears.
2. analyze_char_frequency(self)
o Returns a Counter object with the frequency of each character in the
text (excluding spaces).
3. analyze_word_frequency(self)
o Returns a Counter object with the frequency of each unique word in
the text.
4. analyze_word_lengths(self)
o Analyzes the lengths of all the words in the text.
o Returns a Counter object where the keys are word lengths (in
characters) and values are their frequencies.
5. display_analysis(self)
o Prints the analysis results in a readable format, including:
 Character frequency
 Word frequency
 Word length frequency

2. Class: WordGenerator

This class is responsible for generating new words based on the analyzed data.

Initialization:

 Takes two inputs:

1. char_freq: A dictionary of character frequencies from WordAnalyzer.
2. word_list: The list of words from the text.
 It also builds a transition matrix:

o Tracks how likely one character is to follow another, based on the input
text.

Methods:

1. build_transition_matrix(self)
o Creates a Markov Chain-like transition matrix for characters.
o For each character in the words, it calculates:
 The frequency of every possible "next character."
 Normalizes these frequencies to probabilities.
o Example:
 For the word hello, the transitions would be:

rust
Copy code
h -> e
e -> l
l -> l
l -> o

2. generate_word(self, length)
o Uses the transition matrix to generate a word of a specified length.
o Starts with a random character and iteratively adds the next character
based on probabilities in the transition matrix.
3. random_word(self, length)
o Generates a completely random word of the specified length using
character frequencies.
o Characters are chosen independently of one another.

3. Main Script

This section ties everything together and demonstrates how to use the classes.
Steps:

1. Input text:
o The text is provided as a string: "hello world this is a simple example of
word analysis and generation".
2. Analyze the text:
o The WordAnalyzer class computes the following:
 Frequency of each character (e.g., h appears twice, e appears 5
times, etc.).
 Frequency of each word (e.g., hello appears once, world appears
once, etc.).
 Distribution of word lengths (e.g., 5-character words appear
twice, etc.).
o Results are printed via display_analysis.
3. Generate new words:
o The WordGenerator class creates words using two approaches:
 Markov Chain-based (generate_word):
 Uses the character transition matrix to create more
context-aware words.
 Random character-based (random_word):
 Uses character frequency to randomly pick characters,
independent of sequence.

Example Outputs

Word Analysis:

yaml
Copy code
Character Frequency:
h: 2
e: 5
l: 7
o: 5
w: 2
r: 2
d: 3
t: 2
i: 5
s: 7
a: 5
m: 1
p: 1
n: 3
x: 1
f: 1
g: 1
Word Frequency:
hello: 1
world: 1
this: 1
is: 1
a: 1
simple: 1
example: 1
of: 1
word: 1
analysis: 1
and: 1
generation: 1

Word Length Frequency:

5 characters: 2
4 characters: 1
2 characters: 2
1 characters: 1
...

Word Generation:

yaml
Copy code
Generated Words:
Markov Chain Based: heilo
Random Word: inaso

4. Create a Sample list for at least 5 words with ambiguous sense and
Implement WSD using Python

Word Sense Disambiguation (WSD) is a task in natural language processing (NLP) to

determine the correct sense of a word in context when the word has multiple
meanings. Below is a sample list of 5 ambiguous words and a Python
implementation of WSD using the Lesk algorithm, which is a popular algorithm for
WSD.

Sample List of Ambiguous Words

1. Bank (e.g., river bank, financial institution)

2. Plant (e.g., factory, living organism)
3. Bat (e.g., flying mammal, sports equipment)
4. Light (e.g., illumination, not heavy)
5. Mouse (e.g., computer device, animal)
Python Implementation

Below is the code that uses the Lesk algorithm from the nltk library for WSD:

python
Copy code
from nltk.corpus import wordnet
from nltk.wsd import lesk
from nltk.tokenize import word_tokenize

# Sample sentences with ambiguous words

sentences = [
"The bank of the river was flooded after the heavy rain.",
"He deposited the money in the bank yesterday.",
"The plant in the garden is blooming.",
"The plant was shut down due to maintenance.",
"A bat flew out of the cave at dusk.",
"He hit the ball with his bat during the game.",
"The light from the lamp was very bright.",
"This box is very light and easy to carry.",
"I saw a mouse running across the floor.",
"He clicked the button on the mouse to open the file."
]

# Function to perform Word Sense Disambiguation

def disambiguate_sentence(sentence, target_word):
# Tokenize the sentence
tokens = word_tokenize(sentence)
# Use the Lesk algorithm to determine the sense of the target word
sense = lesk(tokens, target_word)
return sense

# Ambiguous words and their sentences

ambiguous_words = ["bank", "plant", "bat", "light", "mouse"]

# Disambiguating senses
for sentence in sentences:
for word in ambiguous_words:
if word in sentence:
sense = disambiguate_sentence(sentence, word)
print(f"Sentence: {sentence}")
print(f"Word: {word}")
if sense:
print(f"Sense: {sense.name()}")
print(f"Definition: {sense.definition()}")
else:
print("Sense: Not found.")
print("-" * 50)

Explanation of the Code

1. Input Sentences: Each sentence contains one of the ambiguous words.
2. Tokenization: The nltk.tokenize.word_tokenize function splits sentences into
tokens.
3. Lesk Algorithm: The nltk.wsd.lesk function is used to find the appropriate
WordNet sense of the word based on the sentence context.
4. Output: For each ambiguous word, its sense and definition are printed.

Sample Output

For the sentence "The bank of the river was flooded after the heavy rain.", the
output might be:

makefile
Copy code
Sentence: The bank of the river was flooded after the heavy rain.
Word: bank
Sense: bank.n.01
Definition: sloping land (especially the slope beside a body of water)

Ensure you have the necessary NLTK resources downloaded using:

python
Copy code
import nltk
nltk.download('wordnet')
nltk.download('omw-1.4')
nltk.download('punkt')
5. Install NLTK tool kit and perform stemming

Step 1: Install NLTK

1. Using pip: Open your terminal or command prompt and run:

bash
Copy code
pip install nltk

2. Verify Installation: After installation, you can verify it by running the

following command in Python:

python
Copy code
import nltk
print(nltk.__version__) # Check if it's installed and print the version

Step 2: Download NLTK Data

NLTK provides various datasets and models for natural language processing tasks.
To download the required data, follow these steps:

1. Open a Python shell or script.

2. Run:

python
Copy code
import nltk
nltk.download('punkt') # Required for tokenization
nltk.download('wordnet') # Optional: Required for lemmatization

Step 3: Perform Stemming

Stemming is the process of reducing words to their root form. NLTK provides several
stemming algorithms. Here's an example using the Porter Stemmer:

Code Example:

python
Copy code
from nltk.stem import PorterStemmer

# Initialize the stemmer

stemmer = PorterStemmer()
# List of words to stem
words = ["running", "jumps", "easily", "happiness"]

# Perform stemming
stemmed_words = [stemmer.stem(word) for word in words]

print("Original Words:", words)

print("Stemmed Words:", stemmed_words)

Output:

less
Copy code
Original Words: ['running', 'jumps', 'easily', 'happiness']
Stemmed Words: ['run', 'jump', 'easili', 'happi']

Optional: Using Lancaster Stemmer

If you want a more aggressive stemmer, use the Lancaster Stemmer:

python
Copy code
from nltk.stem import LancasterStemmer

# Initialize the Lancaster Stemmer

lancaster_stemmer = LancasterStemmer()

# Perform stemming
stemmed_words = [lancaster_stemmer.stem(word) for word in words]

print("Lancaster Stemmed Words:", stemmed_words)

6. Create Sample list of at least 10 words POS tagging and find the POS for any
given word

Sample List of Words

1. Run
2. Beautiful
3. Cat
4. Slowly
5. Play
6. Happiness
7. Beneath
8. Quickly
9. Book
10.Intelligent

POS Tags for the Words

Here’s the POS (Part of Speech) tagging for each word:

Word POS
Run Verb/Noun
Beautiful Adjective
Cat Noun
Slowly Adverb
Play Verb/Noun
Happiness Noun
Beneath Preposition
Quickly Adverb
Book Noun/Verb
Intelligent Adjective

Function to Get POS for a Given Word

You can implement a simple function in Python to find the POS of any word based
on this list.

python
Copy code
# Sample POS Dictionary
pos_dict = {
"run": ["Verb", "Noun"],
"beautiful": ["Adjective"],
"cat": ["Noun"],
"slowly": ["Adverb"],
"play": ["Verb", "Noun"],
"happiness": ["Noun"],
"beneath": ["Preposition"],
"quickly": ["Adverb"],
"book": ["Noun", "Verb"],
"intelligent": ["Adjective"]
}

# Function to find POS

def get_pos(word):
word = word.lower()
return pos_dict.get(word, "Word not found in the list")

# Example Usage
word = input("Enter a word: ")
pos = get_pos(word)
print(f"POS for '{word}': {pos}")

7. Write a Python program to

a. Perform Morphological Analysis using NLTK library
b. Generate n-grams using NLTK N-Grams library
c. Implement N-Grams Smoothing

a.Perform Morphological Analysis using NLTK library

import nltk
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer, WordNetLemmatizer
from nltk.corpus import wordnet
from nltk import pos_tag

# Download necessary NLTK data

nltk.download('punkt')
nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger')
nltk.download('omw-1.4')

# Input text for analysis

text = "The quick brown fox jumps over the lazy dog."
# Tokenize the text
tokens = word_tokenize(text)
print("Tokens:", tokens)

# Part-of-speech tagging
pos_tags = pos_tag(tokens)
print("\nPOS Tags:", pos_tags)

# Initialize Stemmer and Lemmatizer

stemmer = PorterStemmer()
lemmatizer = WordNetLemmatizer()

# Perform stemming and lemmatization

print("\nMorphological Analysis:")
for word, tag in pos_tags:
stem = stemmer.stem(word)
wordnet_pos = get_wordnet_pos(tag) or wordnet.NOUN # Default to noun
lemma = lemmatizer.lemmatize(word, pos=wordnet_pos)
print(f"Word: {word:12} | Stem: {stem:12} | Lemma: {lemma:12} | POS: {tag}")

Explanation:

1. Tokenization: The text is broken into individual words using word_tokenize.

2. Part-of-Speech Tagging: Each token is tagged with its part of speech using
pos_tag.
3. Stemming: The PorterStemmer reduces words to their root form.
4. Lemmatization: The WordNetLemmatizer reduces words to their base form,
considering the context (POS tags).
5. WordNet POS Conversion: The get_wordnet_pos function maps POS tags
from pos_tag to WordNet-compatible tags for accurate lemmatization.

Output:

For the input sentence "The quick brown fox jumps over the lazy dog.", the program
produces tokens, POS tags, stems, and lemmas.

Example output:

yaml
Copy code
Tokens: ['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']

POS Tags: [('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'VBZ'),
('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN')]

b. Generate n-grams using NLTK N-Grams library

import nltk
from nltk.util import ngrams
from nltk.tokenize import word_tokenize

# Ensure necessary NLTK data is downloaded

nltk.download('punkt')

def generate_ngrams(text, n):

"""
Generate n-grams from a given text.

Args:
text (str): The input text to process.
n (int): The size of n-grams to generate (e.g., 2 for bigrams, 3 for trigrams).

Returns:
list: A list of n-grams, each represented as a tuple.
"""
# Tokenize the text into words
tokens = word_tokenize(text)

# Generate n-grams
n_grams = list(ngrams(tokens, n))
return n_grams

# Example usage
if __name__ == "__main__":
sample_text = "This is a simple example to generate n-grams using NLTK."
n = 2 # For bigrams; change this value for different n-grams

n_grams = generate_ngrams(sample_text, n)
print(f"{n}-grams:")
for gram in n_grams:
print(gram)
How It Works:

1. Tokenization: The text is tokenized into words using NLTK's word_tokenize.

2. Generating N-Grams: The nltk.util.ngrams function is used to generate n-
grams.
3. Output: The resulting n-grams are displayed as tuples.

Example Output:

For the input text: "This is a simple example to generate n-grams using NLTK." and
n=2:

arduino
Copy code
2-grams:
('This', 'is')
('is', 'a')
('a', 'simple')
('simple', 'example')
('example', 'to')
('to', 'generate')
('generate', 'n-grams')
('n-grams', 'using')
('using', 'NLTK')
('.')

C.Implement N-Grams Smoothing

from collections import defaultdict

import math

class NGramModel:
def __init__(self, n):
self.n = n # The 'n' in n-grams
self.ngram_counts = defaultdict(int)
self.context_counts = defaultdict(int)
self.vocabulary = set()

def train(self, corpus):

"""
Trains the model on the provided corpus.
:param corpus: A list of sentences, where each sentence is a list of words.
"""
for sentence in corpus:
sentence = ['<s>'] * (self.n - 1) + sentence + ['</s>']
for i in range(len(sentence) - self.n + 1):
ngram = tuple(sentence[i:i + self.n])
context = ngram[:-1]
word = ngram[-1]

self.ngram_counts[ngram] += 1
self.context_counts[context] += 1
self.vocabulary.add(word)

def probability(self, context, word, smoothing=True):

"""
Calculates the probability of a word given its context.
:param context: A tuple of words representing the context.
:param word: The word whose probability is to be calculated.
:param smoothing: Whether to apply Laplace smoothing.
:return: The probability of the word given the context.
"""
ngram = context + (word,)
if smoothing:
# Laplace (Add-One) Smoothing
numerator = self.ngram_counts[ngram] + 1
denominator = self.context_counts[context] + len(self.vocabulary)
else:
# No smoothing
numerator = self.ngram_counts[ngram]
denominator = self.context_counts[context]

return numerator / denominator if denominator > 0 else 0.0

def generate_sentence(self, max_length=20):

"""
Generates a sentence using the model.
:param max_length: The maximum length of the sentence to generate.
:return: A generated sentence as a list of words.
"""
context = ('<s>',) * (self.n - 1)
sentence = []

for _ in range(max_length):
word_probabilities = {word: self.probability(context, word) for word in
self.vocabulary}
next_word = max(word_probabilities, key=word_probabilities.get)
if next_word == '</s>':
break
sentence.append(next_word)
context = context[1:] + (next_word,)
return sentence

# Example usage
if __name__ == "__main__":
corpus = [
["the", "cat", "sat"],
["the", "cat", "sat", "on", "the", "mat"],
["the", "dog", "barked"]
]

ngram_model = NGramModel(2) # Bigram model

ngram_model.train(corpus)

print("Probability of 'cat' given 'the':", ngram_model.probability(('the',), 'cat'))

print("Probability of 'sat' given 'cat':", ngram_model.probability(('cat',), 'sat'))
print("Generated sentence:", " ".join(ngram_model.generate_sentence()))

Key Features:

1. Training on Corpus: The train method processes a corpus of sentences and

builds counts for n-grams and contexts.
2. Laplace Smoothing: Adds one to the numerator and adjusts the
denominator by adding the vocabulary size to avoid zero probabilities.
3. Sentence Generation: Generates sentences using the trained model by
choosing the most probable word iteratively.

8. Using NLTK package to convert audio file to text and text file to audio files

The NLTK (Natural Language Toolkit) package is primarily used for natural language
processing tasks such as tokenization, stemming, lemmatization, and sentiment
analysis. However, it does not have built-in support for converting audio to text or
vice versa. For these tasks, you can use other specialized libraries:

1. Converting Audio to Text:

o Use SpeechRecognition, a Python library for speech-to-text conversion.
2. Converting Text to Audio:
o Use gTTS (Google Text-to-Speech) or pyttsx3 for text-to-speech
conversion.

Here's an example script for both tasks:

Requirements
Install the required libraries using pip:

bash
Copy code
pip install SpeechRecognition gTTS pydub

Script

python
Copy code
import os
import speech_recognition as sr
from gtts import gTTS

# Function to convert audio to text

def audio_to_text(audio_file_path, text_file_path):
recognizer = sr.Recognizer()
with sr.AudioFile(audio_file_path) as source:
audio_data = recognizer.record(source)
try:
text = recognizer.recognize_google(audio_data)
with open(text_file_path, 'w') as file:
file.write(text)
print(f"Transcription saved to {text_file_path}")
except sr.UnknownValueError:
print("Audio is not clear enough to transcribe.")
except sr.RequestError as e:
print(f"Error with the SpeechRecognition service: {e}")

# Function to convert text to audio

def text_to_audio(text_file_path, output_audio_file_path):
with open(text_file_path, 'r') as file:
text = file.read()
tts = gTTS(text=text, lang='en')
tts.save(output_audio_file_path)
print(f"Audio saved to {output_audio_file_path}")

# Example usage
audio_file = "example_audio.wav" # Replace with your audio file
text_file = "output_text.txt"
output_audio_file = "output_audio.mp3"

# Convert audio to text

audio_to_text(audio_file, text_file)

# Convert text to audio

text_to_audio(text_file, output_audio_file)

Explanation
1. Audio to Text:
o Uses speech_recognition to transcribe speech from an audio file (e.g.,
WAV).
o Saves the transcription to a text file.
2. Text to Audio:
o Reads the content of a text file.
o Uses gTTS to convert the text to speech and saves it as an audio file
(e.g., MP3).

Ccs369 - Text and Speech Analysis - Lab Manual
100% (1)
Ccs369 - Text and Speech Analysis - Lab Manual
23 pages
Ai & ML Week-11
No ratings yet
Ai & ML Week-11
32 pages
Lab Manual - NLP
No ratings yet
Lab Manual - NLP
60 pages
Module 5
No ratings yet
Module 5
69 pages
NLP - Record (Weeks 1-12)
No ratings yet
NLP - Record (Weeks 1-12)
41 pages
Ccs339 Text and Speech Analysis Lab Manual
No ratings yet
Ccs339 Text and Speech Analysis Lab Manual
51 pages
NLP Lab Work
No ratings yet
NLP Lab Work
34 pages
CCS369-Text and Speech Analysis Lab (1-9)
No ratings yet
CCS369-Text and Speech Analysis Lab (1-9)
37 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
32 pages
NLP Record
No ratings yet
NLP Record
23 pages
Ai&Ml Bai601 NLP Lab Manual
No ratings yet
Ai&Ml Bai601 NLP Lab Manual
48 pages
Tsarecord
No ratings yet
Tsarecord
22 pages
DSBA+Master+Codebook+ +Text+Mining+&+TSF
No ratings yet
DSBA+Master+Codebook+ +Text+Mining+&+TSF
11 pages
1 - Write A Python Program To Perform Following Tasks On Text A) Tokenization
No ratings yet
1 - Write A Python Program To Perform Following Tasks On Text A) Tokenization
13 pages
NLP Lab - Manual
No ratings yet
NLP Lab - Manual
33 pages
Computer Scinece Practical File
No ratings yet
Computer Scinece Practical File
52 pages
N Gram Presentation
No ratings yet
N Gram Presentation
29 pages
Tsa Labmanual
No ratings yet
Tsa Labmanual
26 pages
NLPPractical
No ratings yet
NLPPractical
12 pages
CCS369 - Text and Speech Analysis
No ratings yet
CCS369 - Text and Speech Analysis
31 pages
Final NLP Lab File
No ratings yet
Final NLP Lab File
28 pages
All Practicals
No ratings yet
All Practicals
33 pages
7 Exp
No ratings yet
7 Exp
6 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
7 pages
SK NLP Practical (FS)
No ratings yet
SK NLP Practical (FS)
22 pages
TSA Student
No ratings yet
TSA Student
20 pages
DSBDL Assn 07
No ratings yet
DSBDL Assn 07
4 pages
Akshat Sethi Practical File
No ratings yet
Akshat Sethi Practical File
50 pages
Text Processing
No ratings yet
Text Processing
16 pages
NLP - Practical List
No ratings yet
NLP - Practical List
14 pages
Batch 2
No ratings yet
Batch 2
13 pages
Aim - Procedure - Result - Single Side
No ratings yet
Aim - Procedure - Result - Single Side
18 pages
NLP EXP 3 (B) - Word Generation
No ratings yet
NLP EXP 3 (B) - Word Generation
2 pages
Lab1 IR
No ratings yet
Lab1 IR
14 pages
Practical File by Aksh Jaiswal
No ratings yet
Practical File by Aksh Jaiswal
48 pages
Exp-2 NLP
No ratings yet
Exp-2 NLP
4 pages
Bling
No ratings yet
Bling
7 pages
ASTW RA03 PracticalManual
No ratings yet
ASTW RA03 PracticalManual
18 pages
CS Practical File
No ratings yet
CS Practical File
47 pages
Wsma Final Manual
No ratings yet
Wsma Final Manual
58 pages
Soundarya 256 NLP Practs
No ratings yet
Soundarya 256 NLP Practs
14 pages
An Exegetical Summary of 2 Timothy PDF
50% (2)
An Exegetical Summary of 2 Timothy PDF
103 pages
Sahil NLP
No ratings yet
Sahil NLP
16 pages
NLP - (Natural Language Processing Lab Manual)
No ratings yet
NLP - (Natural Language Processing Lab Manual)
12 pages
R22 NLP Python Programs
No ratings yet
R22 NLP Python Programs
15 pages
Analizing Meaning
100% (2)
Analizing Meaning
502 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
NLP Lab1
No ratings yet
NLP Lab1
6 pages
Exercise 2 en
No ratings yet
Exercise 2 en
3 pages
Ngram 2x3
No ratings yet
Ngram 2x3
5 pages
Jal Patel NLP
No ratings yet
Jal Patel NLP
32 pages
1 ST
No ratings yet
1 ST
3 pages
NLP TP1 Report Lahouel Ibrahim
No ratings yet
NLP TP1 Report Lahouel Ibrahim
6 pages
Artificial Intelligience
No ratings yet
Artificial Intelligience
3 pages
NLP Smitpatel
No ratings yet
NLP Smitpatel
32 pages
Project Assignment 4: Markov Chain
No ratings yet
Project Assignment 4: Markov Chain
10 pages
439 - Ok Engl 40013 Special Topics in English Across The Professions
No ratings yet
439 - Ok Engl 40013 Special Topics in English Across The Professions
159 pages
Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
17 pages
NLP MTE Syllabus and Practice Problems
No ratings yet
NLP MTE Syllabus and Practice Problems
2 pages
Karousou & Economacou (2024)
No ratings yet
Karousou & Economacou (2024)
27 pages
Auditory Verbal Therapy Plan
No ratings yet
Auditory Verbal Therapy Plan
3 pages
IELTS Focus - Speaking @PDF - IELTS
No ratings yet
IELTS Focus - Speaking @PDF - IELTS
50 pages
CE3 I U1 2 TestA
100% (1)
CE3 I U1 2 TestA
4 pages
Https Raw - Githubusercontent.com Joelgrus Data-Science-From-Scratch Master Code Natural Language Processing
No ratings yet
Https Raw - Githubusercontent.com Joelgrus Data-Science-From-Scratch Master Code Natural Language Processing
5 pages
Antconc Tutorial
No ratings yet
Antconc Tutorial
2 pages
Word Form: Parts of Speech
100% (1)
Word Form: Parts of Speech
33 pages
Academic VS Non-Academic
No ratings yet
Academic VS Non-Academic
3 pages
Academic Writing Style
No ratings yet
Academic Writing Style
9 pages
Alfabeto Tibetano
No ratings yet
Alfabeto Tibetano
16 pages
Free Writing: What Goals and Aspirations Do You Hope To Fulfill in The Next Year?
No ratings yet
Free Writing: What Goals and Aspirations Do You Hope To Fulfill in The Next Year?
19 pages
Structural Classification of The English Sentences
100% (1)
Structural Classification of The English Sentences
5 pages
English Phonetics - Lecture 1
No ratings yet
English Phonetics - Lecture 1
38 pages
Grammar Unit5 Condition Sentense
No ratings yet
Grammar Unit5 Condition Sentense
25 pages
Case Summary Sheet Santiago
No ratings yet
Case Summary Sheet Santiago
3 pages
F.L.Rented Properties, C-8, Test 3
No ratings yet
F.L.Rented Properties, C-8, Test 3
9 pages
Your Space 1 Unit Test 9
No ratings yet
Your Space 1 Unit Test 9
2 pages
Trung Tâm Anh NG Nhung PH M 27N7A KĐT Trung Hòa Nhân Chính - 0946 530 486 - 0964 177 322
No ratings yet
Trung Tâm Anh NG Nhung PH M 27N7A KĐT Trung Hòa Nhân Chính - 0946 530 486 - 0964 177 322
3 pages
Daily Activities-3grade
No ratings yet
Daily Activities-3grade
3 pages
Le Conditionnel: If... Then
No ratings yet
Le Conditionnel: If... Then
3 pages
B2 Listening 8
No ratings yet
B2 Listening 8
1 page
Fast Test 1: Marco Antonio Churata Mancha
No ratings yet
Fast Test 1: Marco Antonio Churata Mancha
1 page
Rubric PDF
No ratings yet
Rubric PDF
1 page
Summary ELT Methodology
No ratings yet
Summary ELT Methodology
5 pages
Comparatives Superlatives: Main Use Superlative
No ratings yet
Comparatives Superlatives: Main Use Superlative
4 pages
16 Tenses
No ratings yet
16 Tenses
9 pages
2021 Get Smart Plus 4 Final Asessment
No ratings yet
2021 Get Smart Plus 4 Final Asessment
9 pages
ZERO FIRST CONDITIONALS Sesion 12
No ratings yet
ZERO FIRST CONDITIONALS Sesion 12
3 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Natural Language Processing Lab Manual

Uploaded by

Natural Language Processing Lab Manual

Uploaded by

Natural Language Processing Lab Manual

1. Implement Python program to perform following tasks on text

b) Tokenization b) Stop word Removal

a. Python Program to Perform Tokenization:

from nltk.tokenize import word_tokenize, sent_tokenize

 Sentence Tokenization: The sent_tokenize function breaks the text into

b. Python Program to Perform Stop word Removal

# Download NLTK stopwords and punkt tokenizer models

# Get the set of stopwords in English

# Remove stopwords from the tokenized words

# Join the filtered words back into a string

 word_tokenize: Tokenizes the input text into individual words.

2. Implement Python program to implement Porter stemmer algorithm

from nltk.stem import PorterStemmer

from nltk.tokenize import word_tokenize

# Initialize the Porter

Stemmer porter_stemmer = PorterStemmer()

# Tokenize the text into words

# Apply stemming to each word

stemmed_words = [porter_stemmer.stem(word) for word in words]

# Join the stemmed words back into a single string

stemmed_text = ' '.join(stemmed_words)

# Print stemmed text

3. Implement Python Program for

 Generate random words based on character frequency.

from collections import Counter, defaultdict

def __init__(self, text):

return Counter(self.text.replace(" ", ""))

return Counter(len(word) for word in self.word_list)

for char, freq in self.char_freq.items():

for word, freq in self.word_freq.items():

print(f" {word}: {freq}")

print("\nWord Length Frequency:")

for length, freq in self.analyze_word_lengths().items():

print(f" {length} characters: {freq}")

def __init__(self, char_freq, word_list):

for word in self.word_list:

for i in range(len(word) - 1):

for char, transitions in matrix.items():

for next_char in transitions:

for _ in range(length - 1):

if start_char not in self.transition_matrix:

def random_word(self, length):

generator = WordGenerator(analyzer.char_freq, analyzer.word_list)

print("Markov Chain Based:", generator.generate_word(6))

print("Random Word:", generator.random_word(6))

 Takes two inputs:

Word Length Frequency:

Word Sense Disambiguation (WSD) is a task in natural language processing (NLP) to

Sample List of Ambiguous Words

1. Bank (e.g., river bank, financial institution)

# Sample sentences with ambiguous words

# Function to perform Word Sense Disambiguation

# Ambiguous words and their sentences

Explanation of the Code

Ensure you have the necessary NLTK resources downloaded using:

Step 1: Install NLTK

1. Using pip: Open your terminal or command prompt and run:

2. Verify Installation: After installation, you can verify it by running the

Step 2: Download NLTK Data

1. Open a Python shell or script.

Step 3: Perform Stemming

# Initialize the stemmer

print("Original Words:", words)

Optional: Using Lancaster Stemmer

If you want a more aggressive stemmer, use the Lancaster Stemmer:

# Initialize the Lancaster Stemmer

print("Lancaster Stemmed Words:", stemmed_words)

Sample List of Words

POS Tags for the Words

Here’s the POS (Part of Speech) tagging for each word:

Function to Get POS for a Given Word

# Function to find POS

7. Write a Python program to

a.Perform Morphological Analysis using NLTK library

# Download necessary NLTK data

def init(self, text):

def init(self, char_freq, word_list):