Business Analytics CA3
Business Analytics CA3
Business Analytics CA3
Data cleaning
# Preprocess text (removing stopwords, punctuation, convert to lowercase,
etc.)
Input:
t=tokens(c,
remove_punct = T,
remove_symbols = T,
remove_url = T,
remove_numbers = T)
t
stopwords()
require(tm)
stopwords()
t=tokens_remove(t,stopwords())
t
Output:
# Create a corpus object
Input:
Corpus=Corpus(VectorSource(text_data))
Output:
Input:
summary(c)
textstat_summary(c)
Input:
#Creating Wordcloud
wordcloud(corpus,min.freq=3,
colors=rainbow(9),
rot.per=.4,
random.order =F,
ramdom.color=T)
Output:
3. Sentiment Analysis
# Perform sentiment analysis
sentiment = analyzeSentiment(text)
sentiment
# Convert sentiment analysis results to data frame
sentiment_df <- data.frame(sentiment)
sentiment_df
# Plot the distribution of sentiment scores
ggplot(sentiment_df, aes(x = text_data)) +
geom_bar(fill = "orange", color = "yellow") +
labs(title = "Sentiment Analysis",
x = "Sentiment Score",
y = "Frequency") +
theme_minimal()
4. Topic Modelling
# Convert TDM to Document-Term Matrix (DTM)
Input:
dtm =as.matrix(tdm)
dtm
Output:
Output:
Input:
lda_model = LDA(dtm, k = num_topics, method = "Gibbs", control = list(seed
= 1234))
lda_model
Output:
# View the topic modeling results
terms =terms(lda_model, 10) # Get top 10 terms for each topic terms
#Interpretation:
Term Document Matrix (TDM):
In a Term Document Matrix (TDM), documents are represented by
columns and unique terms (words) are represented by rows in a
mathematical representation of text data. The frequency of each term in
each document is usually indicated by the entries in the matrix.
The DocumentTermMatrix function is used to construct the TDM after
preprocessing the text data to remove noise such as stopwords, symbols,
URLs, and punctuation. Numerous text analysis tasks, such as document
clustering and topic modelling, are based on this matrix.
Word Cloud:
A word cloud is a graphic representation of text data in which a word's
size reflects how frequently that term occurs in the corpus.
Using the supplied corpus as a basis, the wordcloud function creates the
word cloud. The word cloud's look can be customised by adjusting
parameters like min.freq (the minimum frequency of a word to be
included), colours, rot.per (the percentage of words that rotate), and
random.order.
Word clouds are frequently used to quickly determine which terms are
most common in a corpus and to show how frequently those terms occur.
Sentiment Analysis:
The goal of sentiment analysis is to identify the attitude or emotional
tenor conveyed in text data. A variety of methods, such as rule-based
systems, machine learning models, or lexicon-based approaches, can be
used to carry it out..
The code snippet most likely uses a sentiment analysis tool or library to
examine the text data's sentiment. The outcomes are commonly presented
as sentiment scores that signify the text's positivity, negativity, or
neutrality.
The overall sentiment tendency within the corpus can be better
understood by visualising the sentiment score distribution using a bar
plot.
Topic Modeling:
A statistical modelling method called topic modelling is used to identify
abstract topics or themes in a set of documents.
To assist in the interpretation and labelling of the themes found, the top
phrases related with each subject are extracted following the fitting of the
LDA model.