Ask Analytics - Text Mining in R - Part 3
Ask Analytics - Text Mining in R - Part 3
More Create
Random Base SAS Advanced SAS SAS 'n' Stats Fundoo Stuff SAS Certification Magic with Excel R Python Interview FAQs Hot Jobs Who we are
On your plate
Text Mining in R - Part 3
statistics R Advanced SAS Base S
Comparison and Commonality Cloud and much more Linear Regression interview Text Mining Logis
Regression cluster analysis Magic of Excel Python Ba
In the previous articles of the series, we covered web scraping SAS certification Decision Science time-ser
and basics of text mining. We have also covered basic word cloud. forecasting Macro ARIMA Market Basket Analysis NLP
Visualization SAS Gems Sentiment Analysis automation C
Now it is time to learn some very useful text functions, web
Dashboards Factor Analysis Principal Component Analy
scraping variants. We will also learn "How to create comparison SAS Projetcs Conjoint Analysis X Commands guesstimate
and commonality type of word cloud" and would learn to analyse
the same.
Ask Analytics
2,093 likes
We earlier earlier learned how to extract tweets on a particular Hash Tag. What if Like Page Sign Up
...
Q. Can we need to scrape tweets based on two hash tags occurring together?
Hash tag based scraping is done when we wants to know opinion of people about
certain topic, timeline based scraping is done to know what a person/institution is
up to. But we should learn both.
I was planning to buy a new mobile connection and I was supposed to choose one from Airtel and
Search This Blog
Vodafone. I thought, I should analyze these companies behavior on Twitter and then see it is
helpful in decision making. In this exercise, I got to learn two things : Comparison Cloud and Searc
Commanlity Cloud
Popular Posts
#---------Let's first make the connection between R and Twitter ---------------------#
Difference between Nodupkey and
Nodup in Proc Sort ?
Consumer_key = "6NY7fDv___________QDT6WtrDK2p"
What is the difference between the
Consumer_secret = "6R06rlKb5LEy3yIb_____________HChZCBzXvgXXHV8V6oZC" Nodupkey and Nodup options in Pro
access_token = "3154348417-u0al6vBfU___________YFQwjQJIjQHeMErdJVI" Sort ? Since ages SAS interviewers
access_token_secret = "0fZ5WxRDNfH________________tsqAtIkhAC0NQQaSVWx" have not stopped asking this q...
# We now need to clean the extracted text, we will perform cleaning in two phases Difference between Z-score and Z-
test
What is the difference between Z-
# Cleaning Phase 1 -- Twitter specific cleaning -- For this we define a function score and Z-test? Often this questio
# gsub is very useful function, do learn about it. We would write about it soon. is asked in SAS interviews, so what
should be the perfect answer ....
clean.twitter = function(x)
Descriptive Statistics With Proc
{ Univariate
# remove @ taggings Feel your data ! Before going to a
x = gsub("@\\w+", "", x) battle, a warrior better know what
he is fighting against and so a data
# remove punctuations
analyst ! It is advised to ...
x = gsub("[[:punct:]]", "", x)
# remove links which starts with http Create your own Google in Excel
x = gsub("http\\w+", "", x) Learn to create G o o g l e in Exce
# remove tabs Excited ? Confused ? ... Don't be, as
you are one click away from learnin
x = gsub("[ |\t]{2,}", "", x) how to create y...
# remove blank spaces at the beginning
x = gsub("^ ", "", x) Understanding p-value
# remove blank spaces at the end A tale about p-value You keep seei
the term 'p-value' every now and
x = gsub(" $", "", x)
then, but don't understand what it
# remove non english characters really mean...
x = gsub("[^\x20-\x7E]", "", x)
return(x) The concepts of Bagging and Boosti
} Ensemble Learning Techniques In on
of the previous posts we covered
Random Forest, one of the most
# Now we shall use the function defined above popular ensemble learning
airtel = clean.twitter(airtel_tweets) techniques...
vodafone = clean.twitter(vodafone_tweets)
Follow by Email
# Let us now make the consolidated vectors with all the tweets related to one entity together
Email address... Subm
airtel_1 = paste(airtel, collapse=" ")
vodafone_1 = paste(vodafone, collapse=" ")
if(!require(tm)) install.packages("tm")
library(tm)
corpus = Corpus(VectorSource(The_one ))
# Post cleaning, it is time to create Term Document Matrix. Well there are two such matrices
can be made using tm package :
1. Document Term Matrix (DTM) : A document-term matrix matrix that describes the frequency
of terms that occur in each and every document. In a document-term matrix, rows correspond to
documents in the collection and columns correspond to terms.
2. Term Document Matrix (TDM) : Similar to DTM but transpose of it . In a Term-Document
matrix, rows correspond to terms in the collection and columns correspond to documents.
www.askanalytics.in/2016/05/text-mining-in-r-part-3.html 2/5
27/2/2021 Ask Analytics: Text Mining in R - Part 3
# Back to code
tdm = as.matrix(TermDocumentMatrix(textCorpus))
head(tdm)
Now we shall make the word cloud of two types ( Basic type we have already learnt previously):
Comparison Cloud : Used to check the contrast between two text corpus
Commanlity Cloud : Used to check the common term across various corpus
if(!require(wordcloud)) install.packages("wordcloud")
library(wordcloud)
# comparison cloud
comparison.cloud(tdm, random.order=FALSE,
colors = c("#00B2FF", "red"),
title.size=1.5, min.freq=100, max.words=500)
One thing is sure, If I take vodafone connection, I would need to talk to this Amit one day.
commonality.cloud(tdm, random.order=FALSE,
colors = brewer.pal(8, "Dark2"),
title.size=1.5)
Commanlity Cloud gives an idea about what common terms two (or more) entities are using. In this
case, there is nothing much that can be interpreted.
www.askanalytics.in/2016/05/text-mining-in-r-part-3.html 3/5
27/2/2021 Ask Analytics: Text Mining in R - Part 3
Posted by Unknown
2 comments:
I admire this article for the well-researched content and excellent wording
seo company in chennai
Reply
Publish Preview
www.askanalytics.in/2016/05/text-mining-in-r-part-3.html 4/5
27/2/2021 Ask Analytics: Text Mining in R - Part 3
www.askanalytics.in/2016/05/text-mining-in-r-part-3.html 5/5