NLP LAB MANUAL
NLP LAB MANUAL
Write a Python
First Programn perform following tasks on
to
a) Tokenization & b) Stop word text
Removal
Description:
)Import
Necessary Libraries: The program imports the
word tokenize for required libraries from NLTK, namely
tokenization and stop_words for obtaining
2) Download NLTK
resources (word Resources: Before using NLTK functions,a listtheof stop words.
tokenizer and stop words corpus) are
3) Tokenization and program checks if the required
Stop Word Removal downloaded. If not, it downloads them.
takes the input text as an Function: The tokenize and remove _stopwords function
split it into individual argument. It tokenizes the text using NLTK's word tokenize function to
words. Then, it retrieves
stopwords.words('english'). Finally, it removes stop words fromthetheEnglish
filtered tokens.
stop words using
list of tokens and returns the
4) Main Function: The
main function serves as the entry
text for
demonstration point of the program. It
the text and then prints purposes. It calls the tokenize
and renmove contains a sample
both the original text and the stopwords function to procesS
processed text with stop words removed.
Program:
import nltk
from nltk.tokenize import word
tokenize
from nltk.corpus import
stopwords
#Download NLTK
resources if not already downloaded
nltk.download('punkt')
nltk.download('stopwords')
def tokenize and remove
stopwords(text):
lokenizes the input text and removes stop words.
Args:
ext (str): The input text to be
processed.
Returns:
Iist: Alist of tokens after removing stop
words.
Output:
Original text: This is a sample sentence, showing off the stop
words removal and tokenization.
Tokenized text with stop words removed:
r'sample', 'sentence', '", 'showing', 'stop', 'words, 'removal', 'tokenization',]
Original text: The original input text is displayed.
Tokenized text with stop words removed: The input text is tokenized into
individual words. and
+ben stop words are removed. The resuliahg list
contains only the meaningful words from the
original text, excluding common stop words like "This", "is", "a", "the", "and" etc.9
4
EXPERIMENT-2
Aim:
Program:
import nltk
from nltk.stem import PorterStemmer
def porter stemming(text):
6
EXPERIMENT-3
Aim:
2) Word Generation: We generate new words using the concept of Markov chains. Markov chains
arestochastic models that describe a sequence of possible events in which the probability of
each event depends only on the state atiained in the previous event. We use NLTK's
MarkovModel for word generation based on the provided text.
Program:
import nltk
import rarndom
def word analysis(text):
Analyzes the given text to calculate the frequency of occurrence of each word.
Args:
text (str): The input text to be analyzed.
Returns:
Generates new words using Markov chains based on the provided text.
Args:
ext (str): The input text to generate new words from.
num_ words (int): The number of words to generate.
Returns:
list: Alist of generated words.
tokens = nltk.word tokenize(text)
model = nltk.MarkovModel(tokens)
generated_words model.generate(num_words)
return generated words
def main(0:
Output:
Word Analysis:
[(The', 4), ('quick', 1), (brown', 1), ('fox, 2),
Word Generation: (jumps', 1)]
'jumps', 'over', 'the, 'quick', 'brown', 'fox',
jumps', 'over', 'the, 'quick']
EXPERIMENT-4
Aim:
Create aSamplelist ffor at least 5 words with ambiguous sense and Write a Python program
to implement WSD
Description:
Word Sense Disambiguation (WSD) is the task of determining the corect meaning of a wOrd win
multiple meanings (senses) based on the context in which it annears, The Lesk algorithm 1s a
popular
approach for WSD, which compares the meanings of words in a given context with the meanings ol
words in the dictionary definitions.
Program:
main()
Output:
Word: bank
Sentence: I deposited money in the bank.
Sense: a financial institution that accepts
Word: bat deposits and channels the money into lending
Sentence: The baseball player swung the bat.
activities
Sense: (baseball) a club used for hitting a ball in
Word: crane various games
Sentence: The crane lifted heavy loads at the
Sense: large long-necked wading bird of construction site,
Word: light marshes and plains in many parts of the world
Sentence: Turn on the light, please.
Sense: (physics)
Word: bass electromagnetic radiation that can produce a visual sensation
Sentence: He caught a large bass whiletishing.
Sense: the lowest part of the musical range.
10
EXPERIMENT-S
Aim:
INLTKtool
Install kit and perform stemming
Description:
Toinstall NLTK, you can use pip, Python's package manager. Here's the command to install NLTK:
pipinstallnltk
Once NLTKis installed, you can perform stemming using various stemming algorithms ávailable in
s TK.One of the popular stemming algorithms is the Porter
stemming algorithm.
Stemming is the proceSs of reducing words to their root or base form. NLTK provides various
stemming algorithms, such as the Porter, Lancaster, and Snowball stemmers. In this program, well
nse the Porter stemmng algorithm to perform stemming on a sample text.
Program:
importnltk
from nltk.stem import PorterStemmer
#Download NLTK resources if not already downloaded
nltk.download('punkt')
def perform_stemming(text):
Performs stemming on the input text using the Porter stemming algorithm.
Args:
text (str): The input text to be stemmed.
Returns:
Main function to
demonstrate stemming using NLTK.
# Sample text for
demonstration
text ="It is important to be
# Perform stemming very pythonly while you are
pythoning with python."
stemmed _text perform
# Print stemming(text)
original and stemmned text
print("Original text:")
print(text)
priprinntt((st"nStenmmed
emmedtext)text using Porter stemming algorithm:")
if
name Imain
main()
Output:
Original text:
It is
important to be very pythonly while you
Stemmed text using Porter are
pythoning with python.
It is
import to be veri stemming algorithm:
pythonli while you are python with
python
12
EXPERIMENT-6
Aim:
Create Samplelist of at least 10 words POS tagging and findthe POS for any given word
Description:
Part-of-Speech (POS) tagging is the process of assigning grammatical categories (such as noun,
verb, adjective, etc.) to words in a text. NLTK provides a variety of tools and algorithms for POS
tagging, which can be used to analyze and understand the structure of sentences.
Program:
importnltk
def pos_tagging(words):
words.
Finds the Part-of-Speech (POS)tag for the given word in the tagged
Args:
be found.
Word (str):The wordfor which POS tag needs to
POS_tag) pairs.
lagged words(list): A list of tuples containing (word,
Returns:
Str: The POS tag for the given word.
Main function to demonstrate POS tagging and finding POS for a given word.
Output:
POS tagging:
The: DT
quick: JJ
brown: NN
fox: NN
jumps: VBZ
Over: IN
the: DT
lazy: JJ
dog: NN
in: N
the: DT
park: NN
POSfor 'fox': NN
14
EXPERIMENT-7
Aim:
program to
Write aPython
Perforn Morphological Analysis using NLTK library
a)
ark' Generate n-grams using NLTK N-Grams library
b)
c)
mplement N-Grams Smoothing
Description:
structure of words to
A
1) MorphologicalIAnalysis: Morphological analysis involves analyzing the
understandtheir meaning and grammatical properties. NLTK provides tools to
perform
Program:
import nltk
from nltk.util import ngrams
from nltk.Im import Laplace
tokenize
from nltk.tokenize import word
def morphological _analysis(word):
Lemmatizer.
given word using NLTK's WordNet
Performs mnorphological analysis on the
Args:
Word (str): The word to be analyzed.
Returns:
Str: The base form of the word (lemma).
Output:
Morphological analysis of 'running': running
Original text:
The quick brown fox jumps over the lazy dog
Generated 3-grams:
'over),
[CThe', 'quick', brown'), (quick', 'brown', 'fox'), ('brown', fox', jumps'), (fox', jumps',
(jumps, 'over', 'the'), (over, 'the', lazy), (the, lazy', 'dog')]
N-gram probabilities after Laplace smoothing:
NgramModel with 1 3-grams
17
EXPERIMENT-8
Aim:
Using NLTK packageto convert audio file to text and text file to audio files.
Description:
Audio to Text: Speech language into
))Converting recognition is the process of convertingspoken
we'll use
text. NLTK provides an interface to various speech recognition engines. In this program,
ASneechRecognition library along with NLTK to convert an audio file into text.
spoken
2)Converting Text to Audio: Text-to-speech (TTS) is the process of converting text into
language. NLTK provides functionalities to synthesize speech from text using various TTS engines.
Well use the g11S library along with NLTK toconvert text into an audiofile.
Program:
recognizer =sr.Recognizer()
with sr.AudioFile(audio file) as source:
audio_ data=recognizer.record(source)
text =recognizer.recognize_ google(audio_data)
return text
def text to audio(text, output file):
Main function to
demonstrate audio-to-text and
#
Converting text-to-audio conversion.
file = audiofile text
audio to
"sample audio.wav"
recognized text=audio to
print("Audio to Text:") text(audio file)
#priConverting
nt(recogniztext
ed ttoext)
text ="This is a audio
output file sample text-to-speech
="output_ audio. conversion."
text to
audio(text, m
outputfile)
p3"
print("\nText
if
to Audio: Conversion
main()
name
main successful")
Output:
Audio to Text:
this is a
Text to sample audio file for
Audio: Conversion testing
successfultext-to-speech conversion
20