0% found this document useful (0 votes)

59 views

Ask Analytics - Text Mining in R - Part 3

Uploaded by

norelkys

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views

Ask Analytics - Text Mining in R - Part 3

Uploaded by

norelkys

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

27/2/2021 Ask Analytics: Text Mining in R - Part 3

More Create

Random Base SAS Advanced SAS SAS 'n' Stats Fundoo Stuff SAS Certification Magic with Excel R Python Interview FAQs Hot Jobs Who we are

On your plate
Text Mining in R - Part 3
statistics R Advanced SAS Base S
Comparison and Commonality Cloud and much more Linear Regression interview Text Mining Logis
Regression cluster analysis Magic of Excel Python Ba
In the previous articles of the series, we covered web scraping SAS certification Decision Science time-ser
and basics of text mining. We have also covered basic word cloud. forecasting Macro ARIMA Market Basket Analysis NLP
Visualization SAS Gems Sentiment Analysis automation C
Now it is time to learn some very useful text functions, web
Dashboards Factor Analysis Principal Component Analy
scraping variants. We will also learn "How to create comparison SAS Projetcs Conjoint Analysis X Commands guesstimate
and commonality type of word cloud" and would learn to analyse
the same.
Ask Analytics
2,093 likes

With reference to first article of the series : Text Mining in R - Part 1

We earlier earlier learned how to extract tweets on a particular Hash Tag. What if Like Page Sign Up
...

Q. Can we need to scrape tweets based on two hash tags occurring together?

Ans. Yes is it very much possible, use :

xyz = searchTwitter("#MannKiBaat AND #NAMO", n=50)

Q. Can we scrape tweets from specific users timeline, instead of

hashtag basis?

Ans. Well Yes. Example is within the article.

Hash tag based scraping is done when we wants to know opinion of people about
certain topic, timeline based scraping is done to know what a person/institution is
up to. But we should learn both.

I was planning to buy a new mobile connection and I was supposed to choose one from Airtel and
Search This Blog
Vodafone. I thought, I should analyze these companies behavior on Twitter and then see it is
helpful in decision making. In this exercise, I got to learn two things : Comparison Cloud and Searc
Commanlity Cloud
Popular Posts
#---------Let's first make the connection between R and Twitter ---------------------#
Difference between Nodupkey and
Nodup in Proc Sort ?
Consumer_key = "6NY7fDv___________QDT6WtrDK2p"
What is the difference between the
Consumer_secret = "6R06rlKb5LEy3yIb_____________HChZCBzXvgXXHV8V6oZC" Nodupkey and Nodup options in Pro
access_token = "3154348417-u0al6vBfU___________YFQwjQJIjQHeMErdJVI" Sort ? Since ages SAS interviewers
access_token_secret = "0fZ5WxRDNfH________________tsqAtIkhAC0NQQaSVWx" have not stopped asking this q...

Market Basket Analysis in R

# I have masked my credentials, you need to get your own ( If you don't know where you can get it
Market Basket Analysis in R with
from, I believe you have missed the first blog on the Text Mining series. example How can we identify the
different products which can be
if(!require(twitteR)) install.packages("twitteR") bundled together to increase the
sal...
library(twitteR)
setup_twitter_oauth(Consumer_key,Consumer_secret,access_token,access_token_secret) Difference between K-Means and
rm(list = ls()) Hierarchical Clustering - Usage
Optimization
When should I go for K-Means
#------------------CONNECTION DONE------------------------------------# Clustering and when for Hierarchica
www.askanalytics.in/2016/05/text-mining-in-r-part-3.html 1/5
27/2/2021 Ask Analytics: Text Mining in R - Part 3
Clustering ? Often people get confused, which on
of the two i.e. K-Me...
# We would now fetch tweets from Airtel and Vodafone India timeline
Ensemble Technique - Random Fore
#Twitter name for Airtel India : airtelindia in R
#Twitter name for Vodafone India : VodafoneIN Machine Learning Techniques - I
Machine Learning is a buzz word
these days in the world of data
airtel_tweets = userTimeline("airtelindia", n=500, since = "2016-01-01") science and analytics. R and Python have...
vodafone_tweets = userTimeline("VodafoneIN", n=500, since = "2016-01-01")
Data Aggregation in Python
# we now get the text part of the tweets from both the extracts Python Tutorial 6.0 After learning t
merge and appending in Python, le
airtel_tweets = sapply(airtel_tweets, function(x) x$getText())
now explore how to do aggregation
vodafone_tweets = sapply(vodafone_tweets, function(x) x$getText()) the data using Pyth...

# We now need to clean the extracted text, we will perform cleaning in two phases Difference between Z-score and Z-
test
What is the difference between Z-
# Cleaning Phase 1 -- Twitter specific cleaning -- For this we define a function score and Z-test? Often this questio
# gsub is very useful function, do learn about it. We would write about it soon. is asked in SAS interviews, so what
should be the perfect answer ....
clean.twitter = function(x)
Descriptive Statistics With Proc
{ Univariate
# remove @ taggings Feel your data ! Before going to a
x = gsub("@\\w+", "", x) battle, a warrior better know what
he is fighting against and so a data
# remove punctuations
analyst ! It is advised to ...
x = gsub("[[:punct:]]", "", x)
# remove links which starts with http Create your own Google in Excel
x = gsub("http\\w+", "", x) Learn to create G o o g l e in Exce
# remove tabs Excited ? Confused ? ... Don't be, as
you are one click away from learnin
x = gsub("[ |\t]{2,}", "", x) how to create y...
# remove blank spaces at the beginning
x = gsub("^ ", "", x) Understanding p-value
# remove blank spaces at the end A tale about p-value You keep seei
the term 'p-value' every now and
x = gsub(" $", "", x)
then, but don't understand what it
# remove non english characters really mean...
x = gsub("[^\x20-\x7E]", "", x)
return(x) The concepts of Bagging and Boosti
} Ensemble Learning Techniques In on
of the previous posts we covered
Random Forest, one of the most
# Now we shall use the function defined above popular ensemble learning
airtel = clean.twitter(airtel_tweets) techniques...
vodafone = clean.twitter(vodafone_tweets)
Follow by Email
# Let us now make the consolidated vectors with all the tweets related to one entity together
Email address... Subm
airtel_1 = paste(airtel, collapse=" ")
vodafone_1 = paste(vodafone, collapse=" ")

# and now make it one vector, by putting everything in a single vector

The_one = c(airtel_1, vodafone_1)

# Cleaning Phase 2 -- Generic cleaning, which is done by using tm package functions

if(!require(tm)) install.packages("tm")
library(tm)
corpus = Corpus(VectorSource(The_one ))

textCorpus = tm_map(corpus, content_transformer(tolower))

textCorpus = tm_map(textCorpus, removeWords, stopwords("english"))
textCorpus = tm_map(textCorpus, removeNumbers)
textCorpus = tm_map(textCorpus, stripWhitespace)

# Post cleaning, it is time to create Term Document Matrix. Well there are two such matrices
can be made using tm package :

1. Document Term Matrix (DTM) : A document-term matrix matrix that describes the frequency
of terms that occur in each and every document. In a document-term matrix, rows correspond to
documents in the collection and columns correspond to terms.

2. Term Document Matrix (TDM) : Similar to DTM but transpose of it . In a Term-Document
matrix, rows correspond to terms in the collection and columns correspond to documents.
www.askanalytics.in/2016/05/text-mining-in-r-part-3.html 2/5
27/2/2021 Ask Analytics: Text Mining in R - Part 3

# Back to code

tdm = as.matrix(TermDocumentMatrix(textCorpus))
head(tdm)

# It looks like picture in right , we now give name to

column 1 and 2, as per their respective entity

colnames(tdm) = c("Airtel", "Vodafone")

Now we shall make the word cloud of two types ( Basic type we have already learnt previously):

Comparison Cloud : Used to check the contrast between two text corpus
Commanlity Cloud : Used to check the common term across various corpus

# Let's make a comparison cloud

if(!require(wordcloud)) install.packages("wordcloud")
library(wordcloud)
# comparison cloud
comparison.cloud(tdm, random.order=FALSE,
colors = c("#00B2FF", "red"),
title.size=1.5, min.freq=100, max.words=500)

We can see Airtel twitter handle is

mostly talking about its product,
plans, features or events, Vodafone
on the other hand is mainly replying
to unsatisfied customers. Especially
this guy Amit is writing most of their
tweets.

Thoughts that came to my mind :

EITHER Airtel has got less complaints,
while Vodafone has got too many of
those, OR Vodafone is more focused
towards customer satisfaction and
hence it is using its Twitter handle to
reply to customers complaints unlike
Airtel, who is using it for advertising
its products.

One thing is sure, If I take vodafone connection, I would need to talk to this Amit one day.

# Let's now make a commanlity cloud

commonality.cloud(tdm, random.order=FALSE,
colors = brewer.pal(8, "Dark2"),
title.size=1.5)

Commanlity Cloud gives an idea about what common terms two (or more) entities are using. In this
case, there is nothing much that can be interpreted.

www.askanalytics.in/2016/05/text-mining-in-r-part-3.html 3/5
27/2/2021 Ask Analytics: Text Mining in R - Part 3

What's next in the series :

We are going to cover few more functions of tm

package, text association, sentiment analysis and
much more, till then ...

Enjoy reading our other articles and stay tuned with

us.

Kindly do provide your feedback in the 'Comments'

Section and share as much as possible.

A humble appeal : Please do like us @ Facebook

Posted by Unknown

2 comments:

Yogesh June 27, 2019 at 2:33 AM

I admire this article for the well-researched content and excellent wording
seo company in chennai
Reply

for ict 99 October 4, 2019 at 10:18 AM

Great Article
Data Mining Projects IEEE for CSE
Project Centers in Chennai

JavaScript Training in Chennai

JavaScript Training in Chennai
Reply

Enter your comment...

Comment as: Google Accoun

Publish Preview

Do provide us your feedback, it would help us serve your better.

Newer Post Home Older Post

www.askanalytics.in/2016/05/text-mining-in-r-part-3.html 4/5
27/2/2021 Ask Analytics: Text Mining in R - Part 3

Impress with your Solution of previous Fall in love with excel

www.askanalytics.in/2016/05/text-mining-in-r-part-3.html 5/5

Altair Analytics Workbench User Guide en
No ratings yet
Altair Analytics Workbench User Guide en
534 pages
Mastering Azure Synapse Analytics: Learn how to develop end-to-end analytics solutions with Azure Synapse Analytics (English Edition)
From Everand
Mastering Azure Synapse Analytics: Learn how to develop end-to-end analytics solutions with Azure Synapse Analytics (English Edition)
Debananda Ghosh
No ratings yet
R Data Analysis Projects PDF
No ratings yet
R Data Analysis Projects PDF
354 pages
R Google Analytics PDF
No ratings yet
R Google Analytics PDF
82 pages
47347
No ratings yet
47347
43 pages
Theory Questions
No ratings yet
Theory Questions
4 pages
Market Basket Analysis Using: R Tool
No ratings yet
Market Basket Analysis Using: R Tool
23 pages
Learning Data Mining With R Sample Chapter
No ratings yet
Learning Data Mining With R Sample Chapter
35 pages
Introducing Data Mining With Rattle and R
No ratings yet
Introducing Data Mining With Rattle and R
35 pages
Chapter 1 DA
No ratings yet
Chapter 1 DA
73 pages
R in Action 3rd Edition Robert I. Kabacoff - Download the ebook today and own the complete version
100% (3)
R in Action 3rd Edition Robert I. Kabacoff - Download the ebook today and own the complete version
69 pages
Mastering Data Analysis With R - Sample Chapter
No ratings yet
Mastering Data Analysis With R - Sample Chapter
32 pages
(Ebooks PDF) Download (Ebook PDF) Data Mining Concepts and Techniques 3rd Full Chapters
100% (3)
(Ebooks PDF) Download (Ebook PDF) Data Mining Concepts and Techniques 3rd Full Chapters
51 pages
Data analytics using r unit-3
No ratings yet
Data analytics using r unit-3
4 pages
Da-I Unit
No ratings yet
Da-I Unit
9 pages
Screenshot 2024-11-08 at 11.01.05 AM
No ratings yet
Screenshot 2024-11-08 at 11.01.05 AM
54 pages
Theory Questions
No ratings yet
Theory Questions
4 pages
RHadoop
No ratings yet
RHadoop
50 pages
A Novel Visual Analytics Approach For Clustering Large-Scale Social Data
No ratings yet
A Novel Visual Analytics Approach For Clustering Large-Scale Social Data
8 pages
RStudio For R Statistical Computing Cookbook - Sample Chapter
100% (1)
RStudio For R Statistical Computing Cookbook - Sample Chapter
38 pages
1 Business Analytics Unit 1
No ratings yet
1 Business Analytics Unit 1
35 pages
R For Data Science Sample Chapter
100% (1)
R For Data Science Sample Chapter
39 pages
Sentiment Analysis of Online Data For Business Analytics: Synopsis
No ratings yet
Sentiment Analysis of Online Data For Business Analytics: Synopsis
6 pages
(eBook PDF) Data Mining Concepts and Techniques 3rd instant download
100% (4)
(eBook PDF) Data Mining Concepts and Techniques 3rd instant download
54 pages
(eBook PDF) Data Mining Concepts and Techniques 3rdinstant download
100% (3)
(eBook PDF) Data Mining Concepts and Techniques 3rdinstant download
44 pages
Base SAS Certification Questions Series - Part 5
No ratings yet
Base SAS Certification Questions Series - Part 5
3 pages
(Ebook) Behavioral Data Analysis with R and Python: Customer-Driven Data for Real Business Results by Buisson, Florent ISBN 9781492061373, 1492061379 - The latest ebook version is now available for instant access
100% (2)
(Ebook) Behavioral Data Analysis with R and Python: Customer-Driven Data for Real Business Results by Buisson, Florent ISBN 9781492061373, 1492061379 - The latest ebook version is now available for instant access
80 pages
MODULE-2
No ratings yet
MODULE-2
18 pages
Data Mining With Rattle For: Akhil Anil Karun Full Stack Engineer (Java)
No ratings yet
Data Mining With Rattle For: Akhil Anil Karun Full Stack Engineer (Java)
40 pages
Immediate download R in Action 1st Edition Robert Kabacoff ebooks 2024
100% (12)
Immediate download R in Action 1st Edition Robert Kabacoff ebooks 2024
60 pages
R in Action 3rd Edition Robert I. Kabacoff download
100% (2)
R in Action 3rd Edition Robert I. Kabacoff download
62 pages
Analytical Approaches and Tools To Analyze Data
No ratings yet
Analytical Approaches and Tools To Analyze Data
27 pages
chapter-1 Introduction to Data Analytics
No ratings yet
chapter-1 Introduction to Data Analytics
34 pages
GFQR1027 L01
No ratings yet
GFQR1027 L01
5 pages
Big Data Analytics
No ratings yet
Big Data Analytics
19 pages
Big Data Management
No ratings yet
Big Data Management
25 pages
big_data_topic6_[data_analytic]_[thanh_binh_nguyen].TextMark
No ratings yet
big_data_topic6_[data_analytic]_[thanh_binh_nguyen].TextMark
37 pages
(Ebook) R in Action by Robert Kabacoff ISBN 9781935182399, 1935182390 download
100% (1)
(Ebook) R in Action by Robert Kabacoff ISBN 9781935182399, 1935182390 download
57 pages
John - Fields - HW1 Data Mining
No ratings yet
John - Fields - HW1 Data Mining
10 pages
ABF Webinar - Episode 5
No ratings yet
ABF Webinar - Episode 5
46 pages
Introduction To R
No ratings yet
Introduction To R
34 pages
Datamining-lect1 - Introduction to Data Mining
No ratings yet
Datamining-lect1 - Introduction to Data Mining
77 pages
(Ebook PDF) Data Mining For Business Analytics: Concepts, Techniques, and Applications in R
No ratings yet
(Ebook PDF) Data Mining For Business Analytics: Concepts, Techniques, and Applications in R
41 pages
Mastering Social Media Mining With R - Sample Chapter
No ratings yet
Mastering Social Media Mining With R - Sample Chapter
27 pages
Lab Manual 1
No ratings yet
Lab Manual 1
17 pages
1 - Understanding Big Data
No ratings yet
1 - Understanding Big Data
46 pages
(Ebook) Product Analytics: Applied Data Science Techniques for Actionable Consumer Insights (Pearson Business Analytics Series) by Rodrigues, Joanne ISBN 9780135258521, 0135258529 all chapter instant download
100% (11)
(Ebook) Product Analytics: Applied Data Science Techniques for Actionable Consumer Insights (Pearson Business Analytics Series) by Rodrigues, Joanne ISBN 9780135258521, 0135258529 all chapter instant download
65 pages
HKU - 7001 Course Overview
No ratings yet
HKU - 7001 Course Overview
26 pages
4642
No ratings yet
4642
51 pages
BDA-24_Lect (3-4)-(Fundamentals of Data Analysis)
No ratings yet
BDA-24_Lect (3-4)-(Fundamentals of Data Analysis)
15 pages
Activity 3: Pre-Defined, Can Only Be Used For Its Intended Purpose, Which Causes Some Inflexibility
No ratings yet
Activity 3: Pre-Defined, Can Only Be Used For Its Intended Purpose, Which Causes Some Inflexibility
4 pages
R Machine Learning By Example Raghav Bali Dipanjan Sarkar pdf download
No ratings yet
R Machine Learning By Example Raghav Bali Dipanjan Sarkar pdf download
82 pages
Exploratory Data Analysis Using R 1st Edition Ronald K. Pearson All Chapters Instant Download
100% (1)
Exploratory Data Analysis Using R 1st Edition Ronald K. Pearson All Chapters Instant Download
47 pages
BigData_BCom-Unit-2
No ratings yet
BigData_BCom-Unit-2
10 pages
(eBook PDF) Data Mining for Business Analytics: Concepts, Techniques, and Applications in Rpdf download
100% (4)
(eBook PDF) Data Mining for Business Analytics: Concepts, Techniques, and Applications in Rpdf download
44 pages
Traditional Versus Big Data Approach
No ratings yet
Traditional Versus Big Data Approach
25 pages
e
No ratings yet
e
1 page
R Data Analysis Cookbook - Sample Chapter
No ratings yet
R Data Analysis Cookbook - Sample Chapter
29 pages
Introduction Data Science
No ratings yet
Introduction Data Science
29 pages
Statistics with Rust, Second Edition
From Everand
Statistics with Rust, Second Edition
Keiko Nakamura
No ratings yet
Statistics with Rust, Second Edition: Explore rust programming and its powerful crates across data science, machine learning and NLP projects
From Everand
Statistics with Rust, Second Edition: Explore rust programming and its powerful crates across data science, machine learning and NLP projects
Keiko Nakamura
No ratings yet
Payroll Trends
No ratings yet
Payroll Trends
8 pages
SIMSREE Placement Brochure 2017 18
No ratings yet
SIMSREE Placement Brochure 2017 18
17 pages
Artificial Intelligence For Human Resources
100% (1)
Artificial Intelligence For Human Resources
12 pages
Immediate Download Test Bank For Essentials of Business Analytics 1st Edition All Chapters
100% (3)
Immediate Download Test Bank For Essentials of Business Analytics 1st Edition All Chapters
45 pages
Big Data Analytics
No ratings yet
Big Data Analytics
64 pages
STEM Revolution Company Profile
No ratings yet
STEM Revolution Company Profile
12 pages
Test Bank for Statistics for Business & Economics, 13th Edition, David R. Anderson,Dennis J. Sweeney,Thomas A. Williams,Jeffrey D. Camm,James J. Cochran - 2025 Version Is Available With All Chapters
100% (7)
Test Bank for Statistics for Business & Economics, 13th Edition, David R. Anderson,Dennis J. Sweeney,Thomas A. Williams,Jeffrey D. Camm,James J. Cochran - 2025 Version Is Available With All Chapters
72 pages
Freshservice - 2022
No ratings yet
Freshservice - 2022
17 pages
B.E.Cse (AIML)
No ratings yet
B.E.Cse (AIML)
402 pages
Coursera - IBM - Introduction To Data Analytics
No ratings yet
Coursera - IBM - Introduction To Data Analytics
13 pages
Customer Relationship Management Dec 2023
No ratings yet
Customer Relationship Management Dec 2023
12 pages
Data Visualization and Customer Segmentation Slides 2009
100% (1)
Data Visualization and Customer Segmentation Slides 2009
42 pages
Torrens University Business Mban Mbana22 Course Flyer PDF
No ratings yet
Torrens University Business Mban Mbana22 Course Flyer PDF
2 pages
Caltech Data Analytics Brochure 2022
No ratings yet
Caltech Data Analytics Brochure 2022
15 pages
Internship Report Jitender
No ratings yet
Internship Report Jitender
53 pages
Azure for Developers Implement Rich Azure PaaS Ecosystems Using Containers Serverless Services and Storage Solutions 2nd edition by Kamil Mrzyglod 9781803238548 1803238542 instant download
100% (2)
Azure for Developers Implement Rich Azure PaaS Ecosystems Using Containers Serverless Services and Storage Solutions 2nd edition by Kamil Mrzyglod 9781803238548 1803238542 instant download
41 pages
Human Resource Information System With Machine Learning Integration
No ratings yet
Human Resource Information System With Machine Learning Integration
4 pages
Data and Business Analytics Brochure
No ratings yet
Data and Business Analytics Brochure
8 pages
Mastering The Industrial Internet of Things (Iiot)
No ratings yet
Mastering The Industrial Internet of Things (Iiot)
16 pages
Case 7: Otis Elevator:: Mcfarlan Delacey
No ratings yet
Case 7: Otis Elevator:: Mcfarlan Delacey
5 pages
Data Science Courses - R & Python Analysis Tutorials - DataCamp
100% (1)
Data Science Courses - R & Python Analysis Tutorials - DataCamp
24 pages
Marketing Data Lake
No ratings yet
Marketing Data Lake
221 pages
Cigarette Lighter Manufacturing Plant Project Report
No ratings yet
Cigarette Lighter Manufacturing Plant Project Report
19 pages
Spend Mapping
No ratings yet
Spend Mapping
4 pages
B.tech 3-1 - R18 TT March 2021
No ratings yet
B.tech 3-1 - R18 TT March 2021
4 pages
APM S24-J25 Syllabus and Study Guide - Final
No ratings yet
APM S24-J25 Syllabus and Study Guide - Final
18 pages
Roshan Kundalya Resume
No ratings yet
Roshan Kundalya Resume
6 pages
Group 4 - ACS Project Report On 3M India PDF
No ratings yet
Group 4 - ACS Project Report On 3M India PDF
31 pages
2 Makarand Upadhyaya Scopus Educational Administration Theory and Practice
No ratings yet
2 Makarand Upadhyaya Scopus Educational Administration Theory and Practice
10 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Ask Analytics - Text Mining in R - Part 3

Uploaded by

Ask Analytics - Text Mining in R - Part 3

Uploaded by

27/2/2021 Ask Analytics: Text Mining in R - Part 3

With reference to first article of the series : Text Mining in R - Part 1

Ans. Yes is it very much possible, use :

xyz = searchTwitter("#MannKiBaat AND #NAMO", n=50)

Q. Can we scrape tweets from specific users timeline, instead of

Ans. Well Yes. Example is within the article.

Market Basket Analysis in R

# and now make it one vector, by putting everything in a single vector

# Cleaning Phase 2 -- Generic cleaning, which is done by using tm package functions

textCorpus = tm_map(corpus, content_transformer(tolower))

# It looks like picture in right , we now give name to

colnames(tdm) = c("Airtel", "Vodafone")

# Let's make a comparison cloud

We can see Airtel twitter handle is

Thoughts that came to my mind :

# Let's now make a commanlity cloud

What's next in the series :

We are going to cover few more functions of tm

Enjoy reading our other articles and stay tuned with

Kindly do provide your feedback in the 'Comments'

A humble appeal : Please do like us @ Facebook

Yogesh June 27, 2019 at 2:33 AM

for ict 99 October 4, 2019 at 10:18 AM

JavaScript Training in Chennai

Enter your comment...

Comment as: Google Accoun

Do provide us your feedback, it would help us serve your better.

Newer Post Home Older Post

Subscribe to: Post Comments (Atom)

Impress with your Solution of previous Fall in love with excel

Copyright 2015: Ask Analytics. Simple theme. Powered by Blogger.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.