0% found this document useful (0 votes)
20 views

NLP LAB MANUAL

The document outlines a series of experiments involving Python programming for natural language processing tasks using the NLTK library. It covers various techniques such as tokenization, stop word removal, stemming, word analysis, word generation, and part-of-speech tagging. Each experiment includes a description, sample code, and expected output demonstrating the functionality of the implemented algorithms.

Uploaded by

dusanikhil3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

NLP LAB MANUAL

The document outlines a series of experiments involving Python programming for natural language processing tasks using the NLTK library. It covers various techniques such as tokenization, stop word removal, stemming, word analysis, word generation, and part-of-speech tagging. Each experiment includes a description, sample code, and expected output demonstrating the functionality of the implemented algorithms.

Uploaded by

dusanikhil3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Aim: EXPERIMENT-1

Write a Python
First Programn perform following tasks on
to
a) Tokenization & b) Stop word text
Removal
Description:
)Import
Necessary Libraries: The program imports the
word tokenize for required libraries from NLTK, namely
tokenization and stop_words for obtaining
2) Download NLTK
resources (word Resources: Before using NLTK functions,a listtheof stop words.
tokenizer and stop words corpus) are
3) Tokenization and program checks if the required
Stop Word Removal downloaded. If not, it downloads them.
takes the input text as an Function: The tokenize and remove _stopwords function
split it into individual argument. It tokenizes the text using NLTK's word tokenize function to
words. Then, it retrieves
stopwords.words('english'). Finally, it removes stop words fromthetheEnglish
filtered tokens.
stop words using
list of tokens and returns the
4) Main Function: The
main function serves as the entry
text for
demonstration point of the program. It
the text and then prints purposes. It calls the tokenize
and renmove contains a sample
both the original text and the stopwords function to procesS
processed text with stop words removed.
Program:

import nltk
from nltk.tokenize import word
tokenize
from nltk.corpus import
stopwords
#Download NLTK
resources if not already downloaded
nltk.download('punkt')
nltk.download('stopwords')
def tokenize and remove
stopwords(text):
lokenizes the input text and removes stop words.
Args:
ext (str): The input text to be
processed.
Returns:
Iist: Alist of tokens after removing stop
words.

#Tokenize the text


TOkens = word
tokenize(text)
3
# Get English stop words
stop_words = set(stopwords.words('english'))
# Remove stop words from
tokens
filtered tokens = word for word in tokens if word.lower() not in
stop_words]
return filtered tokens
def main):

Main function to demonstrate tokenization and stop


word removal.

# Sample text for


demonstration
text = "This is a sample sentence, showing off the stop
words removal and tokenization."
# Tokenize and
remove stop words
processed text = tokenize and remove
# Print original text stopwords(text)
print("Original text:")
print(text)
# Print tokenized text with
stop wordsremoved
print("\nTokenized text with stop words removed:")
print(processed _texxt)
if name main
main()

This program st tokenizes the input text using NLTK's


word tokenize functton and then
removes stop words using NLTK's English stop words list.
Finally, it prints both the original text
and the processed text with stop words removed.

Output:
Original text: This is a sample sentence, showing off the stop
words removal and tokenization.
Tokenized text with stop words removed:
r'sample', 'sentence', '", 'showing', 'stop', 'words, 'removal', 'tokenization',]
Original text: The original input text is displayed.
Tokenized text with stop words removed: The input text is tokenized into
individual words. and
+ben stop words are removed. The resuliahg list
contains only the meaningful words from the
original text, excluding common stop words like "This", "is", "a", "the", "and" etc.9

4
EXPERIMENT-2
Aim:

1Write a Python program to implement Porter stemmer


algorithm for stemming
Description:
) Import Necessary Libraries: The program imports the PorterStemmer class from the nltk.stem
module.This class implements the Porter stemming algorithm.
2) Porter Stemming Fnction: The porter stemmer example function takes a list of words as input.
It initializes a PorterStemmer object and applies the stemming algorithm to each word in
the list
nsing the stem method of the PorterStemmer object The
stemmed words are collected in a new
list. which is then returned.
3) Main Function: The main function serves as the entry point of the
program. It contains a Saup
list of words for demonstration. It calls the porter stemmer example furnction to
perform
stemming on the sample words and then prints both the original words and the stemmed words.
The Porter stemming algorithm reduces words to their root forms, which can help in tasks like eXL
normalization and information retrieval. It removes common sutixes from words, but it might not
always produce a valid word, as it operates based on a set of rules.

Program:

import nltk
from nltk.stem import PorterStemmer
def porter stemming(text):

Applies Porter stemming algorithm tothe input text.


Args:
text (str): The input text to be stemmed.
Returns:
str: The stemmed text.

# Initialize Porter stemmer


porter = PorterStemmer()
# Tokenize the text
tokens = nltk. word tokenize(text)
#Apply stemming to each token
Stemmed tokens = [porter. stem(token) for token in tokens]
5
# Join the stemmed tokens back into a single string
stemmed text ="join(stemmed tokens)
return stemmed text
def main(0:

Main function to demonstrate Porter


stemming.
# Sample text for
demonstration
text = "II is important to be very pythonly while
you are pythoning with python. All
pythoned poorly at least once." pythoners have
# Apply Porter
stemming
stemmed_text =porter_stemming(text)
# Print original and
stemmed text
print("Original text:")
print(text)
print("nStemmed text using Porter stemming algorithm:")
print(stemmed text)
if name
main n.
main()

Thisprogram utilizes NLTK's


PorterStemmer class to perform stemming on the given text. The
porter stemming function tokenizes the input text, applies stemming to each
stemmer, and then joins the stemmed tokens back token using the Porter
into a single string. Finally, the main
demonstrates use of the Porter stemming algorithm by applying it to a
the function
sample text.
Output:
Original text:
It is important to be very
pythonly while you are pythoning with python. All
pythoned poorly at least once. pythoners have
Stemmed text using Porter stemming algorithm:
It is import to be veri
pythonli while you are python with python. all
least onc . python have python poorli at

6
EXPERIMENT-3
Aim:

Write Python Program for a) Word Analysis b) Word Generation


Description:
1) Word Analysis:In this program, we analyze a given text to understand the frequency of
occurrence of each word. This helps in gaining insights into the most commonly used words in
the text.

2) Word Generation: We generate new words using the concept of Markov chains. Markov chains
arestochastic models that describe a sequence of possible events in which the probability of
each event depends only on the state atiained in the previous event. We use NLTK's
MarkovModel for word generation based on the provided text.

Program:

import nltk
import rarndom
def word analysis(text):

Analyzes the given text to calculate the frequency of occurrence of each word.
Args:
text (str): The input text to be analyzed.
Returns:

dict: Adictionary containing words as keys and their frequencies as values.

tokens nltk. word tokenize(text)


word freq =nltk.FreqDist(tokens)
return word freq
def word generation(text, num_words=10):

Generates new words using Markov chains based on the provided text.
Args:
ext (str): The input text to generate new words from.
num_ words (int): The number of words to generate.
Returns:
list: Alist of generated words.
tokens = nltk.word tokenize(text)
model = nltk.MarkovModel(tokens)
generated_words model.generate(num_words)
return generated words
def main(0:

Main function to demonstrate word


analysis and generation.
# Sample text for
demonstration
text = "The quick brown fox
jumps over the lazy dog. The dog barks
quickly." loudly. The fox runs away
# Word analysis
word freq =
word_analysis(text)
print("Word Analysis:")
print(word freq.most common(5)) # Display 5
# Word generation most common words
generated words = word
print("n Word Generation:")generation(text)
print(generated_words)
if name
main
main)

Output:
Word Analysis:
[(The', 4), ('quick', 1), (brown', 1), ('fox, 2),
Word Generation: (jumps', 1)]
'jumps', 'over', 'the, 'quick', 'brown', 'fox',
jumps', 'over', 'the, 'quick']
EXPERIMENT-4
Aim:

Create aSamplelist ffor at least 5 words with ambiguous sense and Write a Python program
to implement WSD

Description:
Word Sense Disambiguation (WSD) is the task of determining the corect meaning of a wOrd win
multiple meanings (senses) based on the context in which it annears, The Lesk algorithm 1s a
popular
approach for WSD, which compares the meanings of words in a given context with the meanings ol
words in the dictionary definitions.

Sample List of Words with Ambiguous Senses:


"bank"
"bat"
"crane"
"light"
"bass"

Program:

from nltk.wsd import lesk


from nltk.tokenize import word tokenize
def wsd(word, sentence):

Implements Word Sense Disambiguation (WSD) using the Lesk algorithm.


Args:
performed.
Word (str): The ambiguous word for which WSD is
ambiguous word.
Sentence (str): The sentence containing the
Returns:
Str: The disambiguated sense of the word.

lokens = word tokenize(sentence)


Sense = lesk(tokens, word)
appropriate sense found"
Lum sense.definition) if sense else "No
def main():
Main function to demonstrate Word Sense Disambiguation (WSD).

# Sample list of words with


ambiguous senses
words = "bank", "bat", "crane","light", "bass"]
# Sample sentence for each word
sentences =
"Ideposited money in the
bank.",
"The baseball player swung the
bat.",
"The crane lifted heavy loads at the
construction site.",
"Turn on the light, please.".
"He caught a large bass while
fishing."
# Perform WSD for each word in the list
for word, sentence in
zip(words, sentences):
print(f"Word: (word}")
print(f"Sentence: {(sentence}")
print(f"Sense:
if name
(wsd(word, sentence)} n")
main ".

main()

Output:
Word: bank
Sentence: I deposited money in the bank.
Sense: a financial institution that accepts
Word: bat deposits and channels the money into lending
Sentence: The baseball player swung the bat.
activities
Sense: (baseball) a club used for hitting a ball in
Word: crane various games
Sentence: The crane lifted heavy loads at the
Sense: large long-necked wading bird of construction site,
Word: light marshes and plains in many parts of the world
Sentence: Turn on the light, please.
Sense: (physics)
Word: bass electromagnetic radiation that can produce a visual sensation
Sentence: He caught a large bass whiletishing.
Sense: the lowest part of the musical range.

10
EXPERIMENT-S
Aim:

INLTKtool
Install kit and perform stemming
Description:
Toinstall NLTK, you can use pip, Python's package manager. Here's the command to install NLTK:
pipinstallnltk
Once NLTKis installed, you can perform stemming using various stemming algorithms ávailable in
s TK.One of the popular stemming algorithms is the Porter
stemming algorithm.
Stemming is the proceSs of reducing words to their root or base form. NLTK provides various
stemming algorithms, such as the Porter, Lancaster, and Snowball stemmers. In this program, well
nse the Porter stemmng algorithm to perform stemming on a sample text.

Program:

importnltk
from nltk.stem import PorterStemmer
#Download NLTK resources if not already downloaded
nltk.download('punkt')
def perform_stemming(text):

Performs stemming on the input text using the Porter stemming algorithm.
Args:
text (str): The input text to be stemmed.
Returns:

str: The stemmed text.

#Initialize the Porter stemmer


porter = PorterStemmer)
#Tokenize the text
tokens = ntk,word tokenize(text)
HApply stemming to each token
in tokens]
ocInmed_tokens =porter.stem(token) for token
#Join the stemmed tokens back into a single string
Stemmed text =.join(stemmed tokens)
return stemmed text
11
def mainØ:

Main function to
demonstrate stemming using NLTK.
# Sample text for
demonstration
text ="It is important to be
# Perform stemming very pythonly while you are
pythoning with python."
stemmed _text perform
# Print stemming(text)
original and stemmned text
print("Original text:")
print(text)
priprinntt((st"nStenmmed
emmedtext)text using Porter stemming algorithm:")
if
name Imain
main()

Output:
Original text:
It is
important to be very pythonly while you
Stemmed text using Porter are
pythoning with python.
It is
import to be veri stemming algorithm:
pythonli while you are python with
python

12
EXPERIMENT-6
Aim:

Create Samplelist of at least 10 words POS tagging and findthe POS for any given word

Description:
Part-of-Speech (POS) tagging is the process of assigning grammatical categories (such as noun,
verb, adjective, etc.) to words in a text. NLTK provides a variety of tools and algorithms for POS
tagging, which can be used to analyze and understand the structure of sentences.

Program:

importnltk
def pos_tagging(words):

Performs Part-of-Speech (POS)tagging on the given list of words.


Args:
words (list):The list of words to be tagged.
Returns:

list: Alist of tuples containing (word, POS_tag) pairs.

tagged_ words= nltk.pos_tag(words)


return tagged words
defPhd pos(word, tagged words):

words.
Finds the Part-of-Speech (POS)tag for the given word in the tagged
Args:
be found.
Word (str):The wordfor which POS tag needs to
POS_tag) pairs.
lagged words(list): A list of tuples containing (word,
Returns:
Str: The POS tag for the given word.

lor tagged word in tagged words:


iftagged word[0].lower) == word.lower):
return tagged word[1|
Teturn "POS tag not found"
13
def main():

Main function to demonstrate POS tagging and finding POS for a given word.

# Sample list of words


words =["The", "quick", "brown", "fox". "jumps", "over", "the",
# Perform POS "lazy', "dog", "in", "the", "Dark
tagging
tagged words = pos
# Print POS tags tagging(words)
for each word
print("POS tagging:")
for word, pos tag in
tagged words:
print(f" {word}: {pos_tag}")
# Find POS for a
given word
search word ="fox"
pos= find
pos(search word, tagged words)
print(f"nPOS
if name for'{search word}': {pos}")
main
main)

Output:
POS tagging:
The: DT
quick: JJ
brown: NN
fox: NN
jumps: VBZ
Over: IN
the: DT
lazy: JJ
dog: NN
in: N
the: DT
park: NN
POSfor 'fox': NN

14
EXPERIMENT-7

Aim:

program to
Write aPython
Perforn Morphological Analysis using NLTK library
a)
ark' Generate n-grams using NLTK N-Grams library
b)
c)
mplement N-Grams Smoothing
Description:
structure of words to
A
1) MorphologicalIAnalysis: Morphological analysis involves analyzing the
understandtheir meaning and grammatical properties. NLTK provides tools to
perform

nmhological analysis, such as stemming and lemmatization.


characters, etc.)
sequences ofn items (words,
2) N-Grams Generation: N-grams are contiguous
n-grams from a list of tokens.
from a given text. NLTK provides functions to generate problem in
N-Grams Smoothing: N-gram smoothing is atechnique used to address the sparsity
3) implement
n-grams. Here, we'll
language models by assigning non-zero probabilities to unseen
simply add-one (Laplace) smoothing for n-grams.

Program:

import nltk
from nltk.util import ngrams
from nltk.Im import Laplace
tokenize
from nltk.tokenize import word
def morphological _analysis(word):
Lemmatizer.
given word using NLTK's WordNet
Performs mnorphological analysis on the
Args:
Word (str): The word to be analyzed.
Returns:
Str: The base form of the word (lemma).

hmatizer = nltk. WordNetLemmatizer()


lemma = lemmatizer.lemmatize(word)
return lemma
def generate ngrams(text, n):
15
Generates n-grams from the given text.
Args:
text (str): The input text from which n-grams will be generated.
n(int): The size of n-grams (e.g., 2 for bigrams,3 for
trigrams, etc.).
Returns:
list: Alist of
n-grams.
tokens = nltk.word
tokenize(text)
ngrams_list =list(ngrams(tokens, n)
return ngrams list
def ngram smoothing(ngrams_list):

Implements Laplace (add-one) smoothing for n-grams.


Args:
ngrams_ list (list): Alist of n-grams.
Returns:
nltk.Im.Laplace: ALaplace language model trained with smoothed n-grams.
vocab = nltk.Im.
Vocabulary(ngrams list)
laplace =
Laplace(order-len(ngrams_list[0]), vocabulary-vocab)
laplace.fit([ngrams_list])
return laplace
def main):

Main function to demonstrate


morphological analysis, n-grams generation, and n-gram
smoothing.
# Sample word for
morphological analysis
word = "unning"
lemma = morphological _analysis(word)
print(f"Morphological analysis of"{word}': {lemma)")
# Sample text for
n-grams generation and smoothing
text = "The quick brown fox
jumps over the lazy dog"
print("nOriginal text:")
print(text)
16
#Generate trigrams
n=3

trigrams list = generate ngrams(text, n)


print(f"nGenerated{n}-grams:")
print(trigrams_list)
#Apply n-gram smoothing
laplace model =ngram_smoothing(trigrams list)

nrint("nN-gram probabilities after Laplace smoothing:")


print(laplace_model)
if name main
main()

Output:
Morphological analysis of 'running': running
Original text:
The quick brown fox jumps over the lazy dog
Generated 3-grams:
'over),
[CThe', 'quick', brown'), (quick', 'brown', 'fox'), ('brown', fox', jumps'), (fox', jumps',
(jumps, 'over', 'the'), (over, 'the', lazy), (the, lazy', 'dog')]
N-gram probabilities after Laplace smoothing:
NgramModel with 1 3-grams

17
EXPERIMENT-8
Aim:

Using NLTK packageto convert audio file to text and text file to audio files.

Description:
Audio to Text: Speech language into
))Converting recognition is the process of convertingspoken
we'll use
text. NLTK provides an interface to various speech recognition engines. In this program,
ASneechRecognition library along with NLTK to convert an audio file into text.
spoken
2)Converting Text to Audio: Text-to-speech (TTS) is the process of converting text into
language. NLTK provides functionalities to synthesize speech from text using various TTS engines.
Well use the g11S library along with NLTK toconvert text into an audiofile.

Program:

import speech recogition as sr


from gts import gTTS
import os
def audio to_text(audio file):

Converts an audio file to text using speech recognition.


Args:
audio file (str): The path to the audio file.
Returns:
str: The recognized text from the audio file.

recognizer =sr.Recognizer()
with sr.AudioFile(audio file) as source:
audio_ data=recognizer.record(source)
text =recognizer.recognize_ google(audio_data)
return text
def text to audio(text, output file):

Cohverts text tto audio and saves it as a file.


Args:
19
text (str): The text to
be
Output_file (str): The pathconverted to audio.
to save the
output audio file.
gtts =gTTS(text-text, lang='en')
gtts.save(output file)
def main():

Main function to
demonstrate audio-to-text and
#
Converting text-to-audio conversion.
file = audiofile text
audio to
"sample audio.wav"
recognized text=audio to
print("Audio to Text:") text(audio file)
#priConverting
nt(recogniztext
ed ttoext)
text ="This is a audio
output file sample text-to-speech
="output_ audio. conversion."
text to
audio(text, m
outputfile)
p3"
print("\nText
if
to Audio: Conversion
main()
name
main successful")
Output:
Audio to Text:
this is a
Text to sample audio file for
Audio: Conversion testing
successfultext-to-speech conversion

20

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy