0% found this document useful (0 votes)
2 views24 pages

Natural Language Processing Lab Manual

The document is a lab manual for Natural Language Processing (NLP) that outlines various tasks to be implemented using Python, including tokenization, stop word removal, stemming, word analysis, and word generation. It also covers Word Sense Disambiguation (WSD) using the Lesk algorithm and provides instructions for installing the NLTK toolkit. Sample code snippets and explanations for each task are included to guide users in performing NLP tasks effectively.

Uploaded by

vinodparwatham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views24 pages

Natural Language Processing Lab Manual

The document is a lab manual for Natural Language Processing (NLP) that outlines various tasks to be implemented using Python, including tokenization, stop word removal, stemming, word analysis, and word generation. It also covers Word Sense Disambiguation (WSD) using the Lesk algorithm and provides instructions for installing the NLTK toolkit. Sample code snippets and explanations for each task are included to guide users in performing NLP tasks effectively.

Uploaded by

vinodparwatham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Natural Language Processing Lab Manual

1. Implement Python program to perform following tasks on text


a) Tokenization b) Stop word Removal
2. Implement Python program to implement Porter stemmer algorithm for
stemming.
3. Implement Python Program for
a) Word Analysis b) Word Generation
4. Create a Sample list for at least 5 words with ambiguous sense and
Implement WSD using Python
5. Install NLTK tool kit and perform stemming
6. Create Sample list of at least 10 words POS tagging and find the POS for any
given word
7. Write a Python program to
a) Perform Morphological Analysis using NLTK library
b) Generate n-grams using NLTK N-Grams library
c) Implement N-Grams Smoothing
8. Using NLTK package to convert audio file to text and text file to audio files.
1. Implement Python program to perform following tasks on text

b) Tokenization b) Stop word Removal

a. Python Program to Perform Tokenization:

import nltk
nltk.download('punkt') # Download necessary NLTK data files

from nltk.tokenize import word_tokenize, sent_tokenize

# Example text
text = "Natural language processing (NLP) is a field of artificial intelligence
that helps computers understand, interpret, and manipulate human language.
It enables tasks like speech recognition, sentiment analysis, and machine
translation."

# Sentence Tokenization
sentences = sent_tokenize(text)
print("Sentence Tokenization:")
print(sentences)
print()

# Word Tokenization
words = word_tokenize(text)
print("Word Tokenization:")
print(words)

Explanation:

 Sentence Tokenization: The sent_tokenize function breaks the text into


individual sentences.
 Word Tokenization: The word_tokenize function splits the text into
individual words (or tokens).

Word Tokenization: ['Hello', '!', 'My', 'name', 'is', 'Guru', '.', 'How', 'can', 'I', 'assist',
'you', 'today', '?']

Sentence Tokenization: ['Hello! My name is Guru.', 'How can I assist you today?'] n

b. Python Program to Perform Stop word Removal

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

# Download NLTK stopwords and punkt tokenizer models


nltk.download('punkt')
nltk.download('stopwords')

def remove_stopwords(text):
# Tokenize the input text into words
words = word_tokenize(text)

# Get the set of stopwords in English


stop_words = set(stopwords.words('english'))

# Remove stopwords from the tokenized words


filtered_words = [word for word in words if word.lower() not in stop_words]

# Join the filtered words back into a string


return ' '.join(filtered_words)

# Example text
text = "This is an example sentence where we will remove stop words."

# Remove stopwords
filtered_text = remove_stopwords(text)
print("Original Text:", text)
print("Filtered Text:", filtered_text)

Explanation:

 word_tokenize: Tokenizes the input text into individual words.


 stopwords.words('english'): Provides a list of common stop words in English
(like "the", "is", etc.).
 A list comprehension is used to filter out words that are found in the
stop_words list.
 The filtered_words are then joined back into a string without the stop words.

Output:

Original Text: This is an example sentence where we will remove stop words.
Filtered Text: example sentence remove stop words .

2. Implement Python program to implement Porter stemmer algorithm


for stemming.
# Stemming

import nltk

from nltk.stem import PorterStemmer

from nltk.tokenize import word_tokenize


# Download NLTK tokenizer and stemmer models

nltk.download('punkt')

def stem_text(text):

# Initialize the Porter

Stemmer porter_stemmer = PorterStemmer()

# Tokenize the text into words

words = word_tokenize(text)

# Apply stemming to each word

stemmed_words = [porter_stemmer.stem(word) for word in words]

# Join the stemmed words back into a single string

stemmed_text = ' '.join(stemmed_words)

return stemmed_text

# Example text

text = "NLTK is a leading platform for building Python programs to work with human
language data."

# Perform stemming

stemmed_text = stem_text(text)

# Print stemmed text

print(stemmed_text)

OUTPUT:

3. Implement Python Program for


b) Word Analysis b) Word Generation

 Word Analysis:
 Analyze character frequency in a given text.
 Analyze word frequency and length.

 Word Generation:

 Generate random words based on character frequency.


 Use a Markov Chain approach for more context-aware word generation.

import random

from collections import Counter, defaultdict

class WordAnalyzer:

def __init__(self, text):

self.text = text

self.word_list = self.text.split()

self.char_freq = self.analyze_char_frequency()

self.word_freq = self.analyze_word_frequency()

def analyze_char_frequency(self):

return Counter(self.text.replace(" ", ""))

def analyze_word_frequency(self):

return Counter(self.word_list)

def analyze_word_lengths(self):

return Counter(len(word) for word in self.word_list)

def display_analysis(self):

print("Character Frequency:")

for char, freq in self.char_freq.items():


print(f" {char}: {freq}")

print("\nWord Frequency:")

for word, freq in self.word_freq.items():

print(f" {word}: {freq}")

print("\nWord Length Frequency:")

for length, freq in self.analyze_word_lengths().items():

print(f" {length} characters: {freq}")

class WordGenerator:

def __init__(self, char_freq, word_list):

self.char_freq = char_freq

self.word_list = word_list

self.transition_matrix = self.build_transition_matrix()

def build_transition_matrix(self):

matrix = defaultdict(Counter)

for word in self.word_list:

for i in range(len(word) - 1):

matrix[word[i]][word[i + 1]] += 1

for char, transitions in matrix.items():

total = sum(transitions.values())

for next_char in transitions:

transitions[next_char] /= total

return matrix
def generate_word(self, length):

if not self.char_freq:

return ""

start_char = random.choice(list(self.char_freq.keys()))

word = start_char

for _ in range(length - 1):

if start_char not in self.transition_matrix:

break

next_char = random.choices(

list(self.transition_matrix[start_char].keys()),

list(self.transition_matrix[start_char].values())

)[0]

word += next_char

start_char = next_char

return word

def random_word(self, length):

return ''.join(random.choices(

list(self.char_freq.keys()),

weights=list(self.char_freq.values()),

k=length

))

if __name__ == "__main__":

# Example Text
text = "hello world this is a simple example of word analysis and generation"

# Word Analysis

analyzer = WordAnalyzer(text)

analyzer.display_analysis()

# Word Generation

generator = WordGenerator(analyzer.char_freq, analyzer.word_list)

print("\nGenerated Words:")

print("Markov Chain Based:", generator.generate_word(6))

print("Random Word:", generator.random_word(6))

1. Class: WordAnalyzer

This class is responsible for analyzing a given text. It extracts meaningful statistics
about the words and characters in the text.

Methods:

1. __init__(self, text)
o Initializes the WordAnalyzer class with a text input.
o Splits the text into words (self.word_list) and computes:
 Character frequencies: How often each character appears.
 Word frequencies: How often each word appears.
2. analyze_char_frequency(self)
o Returns a Counter object with the frequency of each character in the
text (excluding spaces).
3. analyze_word_frequency(self)
o Returns a Counter object with the frequency of each unique word in
the text.
4. analyze_word_lengths(self)
o Analyzes the lengths of all the words in the text.
o Returns a Counter object where the keys are word lengths (in
characters) and values are their frequencies.
5. display_analysis(self)
o Prints the analysis results in a readable format, including:
 Character frequency
 Word frequency
 Word length frequency

2. Class: WordGenerator

This class is responsible for generating new words based on the analyzed data.

Initialization:

 Takes two inputs:


1. char_freq: A dictionary of character frequencies from WordAnalyzer.
2. word_list: The list of words from the text.
 It also builds a transition matrix:

o Tracks how likely one character is to follow another, based on the input
text.

Methods:

1. build_transition_matrix(self)
o Creates a Markov Chain-like transition matrix for characters.
o For each character in the words, it calculates:
 The frequency of every possible "next character."
 Normalizes these frequencies to probabilities.
o Example:
 For the word hello, the transitions would be:

rust
Copy code
h -> e
e -> l
l -> l
l -> o

2. generate_word(self, length)
o Uses the transition matrix to generate a word of a specified length.
o Starts with a random character and iteratively adds the next character
based on probabilities in the transition matrix.
3. random_word(self, length)
o Generates a completely random word of the specified length using
character frequencies.
o Characters are chosen independently of one another.

3. Main Script

This section ties everything together and demonstrates how to use the classes.
Steps:

1. Input text:
o The text is provided as a string: "hello world this is a simple example of
word analysis and generation".
2. Analyze the text:
o The WordAnalyzer class computes the following:
 Frequency of each character (e.g., h appears twice, e appears 5
times, etc.).
 Frequency of each word (e.g., hello appears once, world appears
once, etc.).
 Distribution of word lengths (e.g., 5-character words appear
twice, etc.).
o Results are printed via display_analysis.
3. Generate new words:
o The WordGenerator class creates words using two approaches:
 Markov Chain-based (generate_word):
 Uses the character transition matrix to create more
context-aware words.
 Random character-based (random_word):
 Uses character frequency to randomly pick characters,
independent of sequence.

Example Outputs

Word Analysis:

yaml
Copy code
Character Frequency:
h: 2
e: 5
l: 7
o: 5
w: 2
r: 2
d: 3
t: 2
i: 5
s: 7
a: 5
m: 1
p: 1
n: 3
x: 1
f: 1
g: 1
Word Frequency:
hello: 1
world: 1
this: 1
is: 1
a: 1
simple: 1
example: 1
of: 1
word: 1
analysis: 1
and: 1
generation: 1

Word Length Frequency:


5 characters: 2
4 characters: 1
2 characters: 2
1 characters: 1
...

Word Generation:

yaml
Copy code
Generated Words:
Markov Chain Based: heilo
Random Word: inaso

4. Create a Sample list for at least 5 words with ambiguous sense and
Implement WSD using Python

Word Sense Disambiguation (WSD) is a task in natural language processing (NLP) to


determine the correct sense of a word in context when the word has multiple
meanings. Below is a sample list of 5 ambiguous words and a Python
implementation of WSD using the Lesk algorithm, which is a popular algorithm for
WSD.

Sample List of Ambiguous Words

1. Bank (e.g., river bank, financial institution)


2. Plant (e.g., factory, living organism)
3. Bat (e.g., flying mammal, sports equipment)
4. Light (e.g., illumination, not heavy)
5. Mouse (e.g., computer device, animal)
Python Implementation

Below is the code that uses the Lesk algorithm from the nltk library for WSD:

python
Copy code
from nltk.corpus import wordnet
from nltk.wsd import lesk
from nltk.tokenize import word_tokenize

# Sample sentences with ambiguous words


sentences = [
"The bank of the river was flooded after the heavy rain.",
"He deposited the money in the bank yesterday.",
"The plant in the garden is blooming.",
"The plant was shut down due to maintenance.",
"A bat flew out of the cave at dusk.",
"He hit the ball with his bat during the game.",
"The light from the lamp was very bright.",
"This box is very light and easy to carry.",
"I saw a mouse running across the floor.",
"He clicked the button on the mouse to open the file."
]

# Function to perform Word Sense Disambiguation


def disambiguate_sentence(sentence, target_word):
# Tokenize the sentence
tokens = word_tokenize(sentence)
# Use the Lesk algorithm to determine the sense of the target word
sense = lesk(tokens, target_word)
return sense

# Ambiguous words and their sentences


ambiguous_words = ["bank", "plant", "bat", "light", "mouse"]

# Disambiguating senses
for sentence in sentences:
for word in ambiguous_words:
if word in sentence:
sense = disambiguate_sentence(sentence, word)
print(f"Sentence: {sentence}")
print(f"Word: {word}")
if sense:
print(f"Sense: {sense.name()}")
print(f"Definition: {sense.definition()}")
else:
print("Sense: Not found.")
print("-" * 50)

Explanation of the Code


1. Input Sentences: Each sentence contains one of the ambiguous words.
2. Tokenization: The nltk.tokenize.word_tokenize function splits sentences into
tokens.
3. Lesk Algorithm: The nltk.wsd.lesk function is used to find the appropriate
WordNet sense of the word based on the sentence context.
4. Output: For each ambiguous word, its sense and definition are printed.

Sample Output

For the sentence "The bank of the river was flooded after the heavy rain.", the
output might be:

makefile
Copy code
Sentence: The bank of the river was flooded after the heavy rain.
Word: bank
Sense: bank.n.01
Definition: sloping land (especially the slope beside a body of water)

Ensure you have the necessary NLTK resources downloaded using:

python
Copy code
import nltk
nltk.download('wordnet')
nltk.download('omw-1.4')
nltk.download('punkt')
5. Install NLTK tool kit and perform stemming

Step 1: Install NLTK

1. Using pip: Open your terminal or command prompt and run:

bash
Copy code
pip install nltk

2. Verify Installation: After installation, you can verify it by running the


following command in Python:

python
Copy code
import nltk
print(nltk.__version__) # Check if it's installed and print the version

Step 2: Download NLTK Data

NLTK provides various datasets and models for natural language processing tasks.
To download the required data, follow these steps:

1. Open a Python shell or script.


2. Run:

python
Copy code
import nltk
nltk.download('punkt') # Required for tokenization
nltk.download('wordnet') # Optional: Required for lemmatization

Step 3: Perform Stemming

Stemming is the process of reducing words to their root form. NLTK provides several
stemming algorithms. Here's an example using the Porter Stemmer:

Code Example:

python
Copy code
from nltk.stem import PorterStemmer

# Initialize the stemmer


stemmer = PorterStemmer()
# List of words to stem
words = ["running", "jumps", "easily", "happiness"]

# Perform stemming
stemmed_words = [stemmer.stem(word) for word in words]

print("Original Words:", words)


print("Stemmed Words:", stemmed_words)

Output:

less
Copy code
Original Words: ['running', 'jumps', 'easily', 'happiness']
Stemmed Words: ['run', 'jump', 'easili', 'happi']

Optional: Using Lancaster Stemmer

If you want a more aggressive stemmer, use the Lancaster Stemmer:

python
Copy code
from nltk.stem import LancasterStemmer

# Initialize the Lancaster Stemmer


lancaster_stemmer = LancasterStemmer()

# Perform stemming
stemmed_words = [lancaster_stemmer.stem(word) for word in words]

print("Lancaster Stemmed Words:", stemmed_words)

6. Create Sample list of at least 10 words POS tagging and find the POS for any
given word

Sample List of Words

1. Run
2. Beautiful
3. Cat
4. Slowly
5. Play
6. Happiness
7. Beneath
8. Quickly
9. Book
10.Intelligent

POS Tags for the Words

Here’s the POS (Part of Speech) tagging for each word:

Word POS
Run Verb/Noun
Beautiful Adjective
Cat Noun
Slowly Adverb
Play Verb/Noun
Happiness Noun
Beneath Preposition
Quickly Adverb
Book Noun/Verb
Intelligent Adjective

Function to Get POS for a Given Word

You can implement a simple function in Python to find the POS of any word based
on this list.

python
Copy code
# Sample POS Dictionary
pos_dict = {
"run": ["Verb", "Noun"],
"beautiful": ["Adjective"],
"cat": ["Noun"],
"slowly": ["Adverb"],
"play": ["Verb", "Noun"],
"happiness": ["Noun"],
"beneath": ["Preposition"],
"quickly": ["Adverb"],
"book": ["Noun", "Verb"],
"intelligent": ["Adjective"]
}

# Function to find POS


def get_pos(word):
word = word.lower()
return pos_dict.get(word, "Word not found in the list")

# Example Usage
word = input("Enter a word: ")
pos = get_pos(word)
print(f"POS for '{word}': {pos}")

7. Write a Python program to


a. Perform Morphological Analysis using NLTK library
b. Generate n-grams using NLTK N-Grams library
c. Implement N-Grams Smoothing

a.Perform Morphological Analysis using NLTK library

import nltk
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer, WordNetLemmatizer
from nltk.corpus import wordnet
from nltk import pos_tag

# Download necessary NLTK data


nltk.download('punkt')
nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger')
nltk.download('omw-1.4')

# Function to get WordNet POS tag


def get_wordnet_pos(tag):
if tag.startswith('J'):
return wordnet.ADJ
elif tag.startswith('V'):
return wordnet.VERB
elif tag.startswith('N'):
return wordnet.NOUN
elif tag.startswith('R'):
return wordnet.ADV
else:
return None

# Input text for analysis


text = "The quick brown fox jumps over the lazy dog."
# Tokenize the text
tokens = word_tokenize(text)
print("Tokens:", tokens)

# Part-of-speech tagging
pos_tags = pos_tag(tokens)
print("\nPOS Tags:", pos_tags)

# Initialize Stemmer and Lemmatizer


stemmer = PorterStemmer()
lemmatizer = WordNetLemmatizer()

# Perform stemming and lemmatization


print("\nMorphological Analysis:")
for word, tag in pos_tags:
stem = stemmer.stem(word)
wordnet_pos = get_wordnet_pos(tag) or wordnet.NOUN # Default to noun
lemma = lemmatizer.lemmatize(word, pos=wordnet_pos)
print(f"Word: {word:12} | Stem: {stem:12} | Lemma: {lemma:12} | POS: {tag}")

Explanation:

1. Tokenization: The text is broken into individual words using word_tokenize.


2. Part-of-Speech Tagging: Each token is tagged with its part of speech using
pos_tag.
3. Stemming: The PorterStemmer reduces words to their root form.
4. Lemmatization: The WordNetLemmatizer reduces words to their base form,
considering the context (POS tags).
5. WordNet POS Conversion: The get_wordnet_pos function maps POS tags
from pos_tag to WordNet-compatible tags for accurate lemmatization.

Output:

For the input sentence "The quick brown fox jumps over the lazy dog.", the program
produces tokens, POS tags, stems, and lemmas.

Example output:

yaml
Copy code
Tokens: ['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']

POS Tags: [('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'VBZ'),
('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN')]

Morphological Analysis:
Word: The | Stem: the | Lemma: the | POS: DT
Word: quick | Stem: quick | Lemma: quick | POS: JJ
Word: brown | Stem: brown | Lemma: brown | POS: NN
Word: fox | Stem: fox | Lemma: fox | POS: NN
Word: jumps | Stem: jump | Lemma: jump | POS: VBZ
Word: over | Stem: over | Lemma: over | POS: IN
Word: the | Stem: the | Lemma: the | POS: DT
Word: lazy | Stem: lazi | Lemma: lazy | POS: JJ
Word: dog | Stem: dog | Lemma: dog | POS: NN

b. Generate n-grams using NLTK N-Grams library

import nltk
from nltk.util import ngrams
from nltk.tokenize import word_tokenize

# Ensure necessary NLTK data is downloaded


nltk.download('punkt')

def generate_ngrams(text, n):


"""
Generate n-grams from a given text.

Args:
text (str): The input text to process.
n (int): The size of n-grams to generate (e.g., 2 for bigrams, 3 for trigrams).

Returns:
list: A list of n-grams, each represented as a tuple.
"""
# Tokenize the text into words
tokens = word_tokenize(text)

# Generate n-grams
n_grams = list(ngrams(tokens, n))
return n_grams

# Example usage
if __name__ == "__main__":
sample_text = "This is a simple example to generate n-grams using NLTK."
n = 2 # For bigrams; change this value for different n-grams

n_grams = generate_ngrams(sample_text, n)
print(f"{n}-grams:")
for gram in n_grams:
print(gram)
How It Works:

1. Tokenization: The text is tokenized into words using NLTK's word_tokenize.


2. Generating N-Grams: The nltk.util.ngrams function is used to generate n-
grams.
3. Output: The resulting n-grams are displayed as tuples.

Example Output:

For the input text: "This is a simple example to generate n-grams using NLTK." and
n=2:

arduino
Copy code
2-grams:
('This', 'is')
('is', 'a')
('a', 'simple')
('simple', 'example')
('example', 'to')
('to', 'generate')
('generate', 'n-grams')
('n-grams', 'using')
('using', 'NLTK')
('.')

C.Implement N-Grams Smoothing

from collections import defaultdict


import math

class NGramModel:
def __init__(self, n):
self.n = n # The 'n' in n-grams
self.ngram_counts = defaultdict(int)
self.context_counts = defaultdict(int)
self.vocabulary = set()

def train(self, corpus):


"""
Trains the model on the provided corpus.
:param corpus: A list of sentences, where each sentence is a list of words.
"""
for sentence in corpus:
sentence = ['<s>'] * (self.n - 1) + sentence + ['</s>']
for i in range(len(sentence) - self.n + 1):
ngram = tuple(sentence[i:i + self.n])
context = ngram[:-1]
word = ngram[-1]

self.ngram_counts[ngram] += 1
self.context_counts[context] += 1
self.vocabulary.add(word)

def probability(self, context, word, smoothing=True):


"""
Calculates the probability of a word given its context.
:param context: A tuple of words representing the context.
:param word: The word whose probability is to be calculated.
:param smoothing: Whether to apply Laplace smoothing.
:return: The probability of the word given the context.
"""
ngram = context + (word,)
if smoothing:
# Laplace (Add-One) Smoothing
numerator = self.ngram_counts[ngram] + 1
denominator = self.context_counts[context] + len(self.vocabulary)
else:
# No smoothing
numerator = self.ngram_counts[ngram]
denominator = self.context_counts[context]

return numerator / denominator if denominator > 0 else 0.0

def generate_sentence(self, max_length=20):


"""
Generates a sentence using the model.
:param max_length: The maximum length of the sentence to generate.
:return: A generated sentence as a list of words.
"""
context = ('<s>',) * (self.n - 1)
sentence = []

for _ in range(max_length):
word_probabilities = {word: self.probability(context, word) for word in
self.vocabulary}
next_word = max(word_probabilities, key=word_probabilities.get)
if next_word == '</s>':
break
sentence.append(next_word)
context = context[1:] + (next_word,)
return sentence

# Example usage
if __name__ == "__main__":
corpus = [
["the", "cat", "sat"],
["the", "cat", "sat", "on", "the", "mat"],
["the", "dog", "barked"]
]

ngram_model = NGramModel(2) # Bigram model


ngram_model.train(corpus)

print("Probability of 'cat' given 'the':", ngram_model.probability(('the',), 'cat'))


print("Probability of 'sat' given 'cat':", ngram_model.probability(('cat',), 'sat'))
print("Generated sentence:", " ".join(ngram_model.generate_sentence()))

Key Features:

1. Training on Corpus: The train method processes a corpus of sentences and


builds counts for n-grams and contexts.
2. Laplace Smoothing: Adds one to the numerator and adjusts the
denominator by adding the vocabulary size to avoid zero probabilities.
3. Sentence Generation: Generates sentences using the trained model by
choosing the most probable word iteratively.

8. Using NLTK package to convert audio file to text and text file to audio files

The NLTK (Natural Language Toolkit) package is primarily used for natural language
processing tasks such as tokenization, stemming, lemmatization, and sentiment
analysis. However, it does not have built-in support for converting audio to text or
vice versa. For these tasks, you can use other specialized libraries:

1. Converting Audio to Text:


o Use SpeechRecognition, a Python library for speech-to-text conversion.
2. Converting Text to Audio:
o Use gTTS (Google Text-to-Speech) or pyttsx3 for text-to-speech
conversion.

Here's an example script for both tasks:

Requirements
Install the required libraries using pip:

bash
Copy code
pip install SpeechRecognition gTTS pydub

Script

python
Copy code
import os
import speech_recognition as sr
from gtts import gTTS

# Function to convert audio to text


def audio_to_text(audio_file_path, text_file_path):
recognizer = sr.Recognizer()
with sr.AudioFile(audio_file_path) as source:
audio_data = recognizer.record(source)
try:
text = recognizer.recognize_google(audio_data)
with open(text_file_path, 'w') as file:
file.write(text)
print(f"Transcription saved to {text_file_path}")
except sr.UnknownValueError:
print("Audio is not clear enough to transcribe.")
except sr.RequestError as e:
print(f"Error with the SpeechRecognition service: {e}")

# Function to convert text to audio


def text_to_audio(text_file_path, output_audio_file_path):
with open(text_file_path, 'r') as file:
text = file.read()
tts = gTTS(text=text, lang='en')
tts.save(output_audio_file_path)
print(f"Audio saved to {output_audio_file_path}")

# Example usage
audio_file = "example_audio.wav" # Replace with your audio file
text_file = "output_text.txt"
output_audio_file = "output_audio.mp3"

# Convert audio to text


audio_to_text(audio_file, text_file)

# Convert text to audio


text_to_audio(text_file, output_audio_file)

Explanation
1. Audio to Text:
o Uses speech_recognition to transcribe speech from an audio file (e.g.,
WAV).
o Saves the transcription to a text file.
2. Text to Audio:
o Reads the content of a text file.
o Uses gTTS to convert the text to speech and saves it as an audio file
(e.g., MP3).

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy