Business Analytics CA3

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

COURSE CODE : MGN 619

COURSE TITLE : Business Analytics

SUBMITTED TO : Dr. Veer P Gangwar

ACADEMIC TASK NO. : 3

ACADEMIC TASK TITLE : PRACTICAL

SUBMITTED BY : Dhanwanti Shah

STUDENT REG. NO. : 12318414

STUDENT ROLL NO. : RQ2356AO6

DATE OF ALLOTMENT : 24.04.2024

DATE OF SUBMISSION : 12.05.2024


Detail of Academic Task
Students can use data from Social Media, E-Commerce site product reviews,
News articles or any other textual literature and use it for creating the following:
1. Term Document Matrix
2. Word Cloud
3. Sentiment Analysis
4. Topic Modelling
The complete techniques used in analysing the text, visualizations and
interpretations need to be incorporated in the report.
Students will be evaluated on the following parameters:
1) Data Cleaning – 5 marks
2) Word Cloud – 5 marks
3) Sentiment Analysis – 5 marks
4) Topic Modelling – 5 marks
5) Term Document Matrix – 5 marks
6) Interpretation – 5 marks
1. Term Document Matrix

# Load text data and Viewing


Input:
text=readtext('PM’s.txt')
text
text$text
View(text)
Output:

Data cleaning
# Preprocess text (removing stopwords, punctuation, convert to lowercase,
etc.)
Input:
t=tokens(c,
remove_punct = T,
remove_symbols = T,
remove_url = T,
remove_numbers = T)
t
stopwords()
require(tm)
stopwords()
t=tokens_remove(t,stopwords())
t
Output:
# Create a corpus object
Input:
Corpus=Corpus(VectorSource(text_data))
Output:

Input:
summary(c)
textstat_summary(c)

# Creating the Term Document Matrix (TDM)


Input:
tdm = DocumentTermMatrix(corpus)
tdm
Output:
2. Word Cloud

Input:
#Creating Wordcloud
wordcloud(corpus,min.freq=3,
colors=rainbow(9),
rot.per=.4,
random.order =F,
ramdom.color=T)

Output:
3. Sentiment Analysis
# Perform sentiment analysis
sentiment = analyzeSentiment(text)
sentiment
# Convert sentiment analysis results to data frame
sentiment_df <- data.frame(sentiment)
sentiment_df
# Plot the distribution of sentiment scores
ggplot(sentiment_df, aes(x = text_data)) +
geom_bar(fill = "orange", color = "yellow") +
labs(title = "Sentiment Analysis",
x = "Sentiment Score",
y = "Frequency") +
theme_minimal()
4. Topic Modelling
# Convert TDM to Document-Term Matrix (DTM)
Input:
dtm =as.matrix(tdm)
dtm
Output:

# Perform topic modelling


Input:
# Perform topic modeling
num_topics <- 2
num_topics

Output:

Input:
lda_model = LDA(dtm, k = num_topics, method = "Gibbs", control = list(seed
= 1234))
lda_model
Output:
# View the topic modeling results
terms =terms(lda_model, 10) # Get top 10 terms for each topic terms

#Interpretation:
Term Document Matrix (TDM):
 In a Term Document Matrix (TDM), documents are represented by
columns and unique terms (words) are represented by rows in a
mathematical representation of text data. The frequency of each term in
each document is usually indicated by the entries in the matrix.
 The DocumentTermMatrix function is used to construct the TDM after
preprocessing the text data to remove noise such as stopwords, symbols,
URLs, and punctuation. Numerous text analysis tasks, such as document
clustering and topic modelling, are based on this matrix.
Word Cloud:
 A word cloud is a graphic representation of text data in which a word's
size reflects how frequently that term occurs in the corpus.
 Using the supplied corpus as a basis, the wordcloud function creates the
word cloud. The word cloud's look can be customised by adjusting
parameters like min.freq (the minimum frequency of a word to be
included), colours, rot.per (the percentage of words that rotate), and
random.order.
 Word clouds are frequently used to quickly determine which terms are
most common in a corpus and to show how frequently those terms occur.
Sentiment Analysis:
 The goal of sentiment analysis is to identify the attitude or emotional
tenor conveyed in text data. A variety of methods, such as rule-based
systems, machine learning models, or lexicon-based approaches, can be
used to carry it out..
 The code snippet most likely uses a sentiment analysis tool or library to
examine the text data's sentiment. The outcomes are commonly presented
as sentiment scores that signify the text's positivity, negativity, or
neutrality.
 The overall sentiment tendency within the corpus can be better
understood by visualising the sentiment score distribution using a bar
plot.
Topic Modeling:
 A statistical modelling method called topic modelling is used to identify
abstract topics or themes in a set of documents.

 By using terms as columns and documents as rows, the script creates a


document-term matrix (DTM), which is a comparable form to the term
document matrix (TDM).

 One well-liked topic modelling approach for finding themes in a corpus is


called Latent Dirichlet Allocation, or LDA. An LDA model with a given
number of topics is created by applying the LDA function to the DTM.

 To assist in the interpretation and labelling of the themes found, the top
phrases related with each subject are extracted following the fitting of the
LDA model.

#These text analysis methods make it possible to do tasks like document


classification, content summary, and trend analysis by offering insightful
information about the underlying structure, themes, and sentiment of text data.
The particular objectives and analysis context determine how the results should
be interpreted and used.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy