Word Cloud
Word Cloud
Word Cloud
More
Next Blog
Create Blog
"A big computer, a complex algorithm and a long time does not equal science." -- Robert Gentleman
Word Cloud in R
A word cloud (or tag cloud) can be an handy tool when you need to highlight the most
commonly cited words in a text using a quick visualization. Of course, you can use one
of the several on-line services, such as wordle or tagxedo , very feature rich and with a
nice GUI. Being an R enthusiast, I always wanted to produce this kind of images within
R and now, thanks to the recently released Ian Fellows' wordcloud package, finally I
can!
In order to test the package I retrieved the titles of the XKCD web comics included in
my RXKCD package and produced a word cloud based on the titles' word frequencies
calculated using the powerful tm package for text mining (I know, it is like killing a fly
with a bazooka!).
library(RXKCD)
library(tm)
library(wordcloud)
library(RColorBrewer)
path <- system.file("xkcd", package = "RXKCD")
datafiles <- list.files(path)
xkcd.df <- read.csv(file.path(path, datafiles))
xkcd.corpus <- Corpus(DataframeSource(data.frame(xkcd.df[, 3])))
xkcd.corpus <- tm_map(xkcd.corpus, removePunctuation)
xkcd.corpus <- tm_map(xkcd.corpus, content_transformer(tolower))
xkcd.corpus <- tm_map(xkcd.corpus, function(x) removeWords(x, stopwords("e
nglish")))
tdm <- TermDocumentMatrix(xkcd.corpus)
m <- as.matrix(tdm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
pal <- brewer.pal(9, "BuGn")
pal <- pal[-(1:2)]
png("wordcloud.png", width=1280,height=800)
wordcloud(d$word,d$freq, scale=c(8,.3),min.freq=2,max.words=100, random.or
der=T, rot.per=.15, colors=pal, vfont=c("sans serif","plain"))
dev.off()
Cerca
Software
colors (5)
contributed (4)
converting
density
(2)
(3)
coordinates
descriptive
(1)
curl
statistics
(1)
(2)
(1)
(3)
function
grep
(1)
(6)
ggplot2
hello
world
(3)
(1)
(1)
output
(1)
overlapping
(3)
replacing
reference (16)
(2)
reproducible
regexp (2)
research
(1)
(1)
string
(3)
substitute
(1)
vector (3)
Archivio blog
As a second example, inspired by this post from the eKonometrics blog, I created a
2013 (2)
2012 (3)
Sign In
2011 (9)
dicembre (2)
novembre (1)
ottobre (1)
luglio (2)
Word Cloud in R
R meets XKCD
giugno (1)
aprile (1)
febbraio (1)
2010 (10)
2009 (29)
2008 (12)
2007 (38)
Links
The R Project
Bioconductor
Rseek
devcheatsheet - R
R Inferno
R bloggers
Revolutions R Blog
Statistics Blog
Statistics with R
Statistica con R
Quick-R
FlowingData tutorials
R Graph Gallery
R Graphical Manual
R Tips / StatsRus
Intelligent Machines
www.dataminingblog.com
Math Blog
e-mb blog
Bioinformatics Zen
follow me on Twitter
live
r Questions
There is a dateset of timehow
can i group by
month,day ,morning ,afternoon
and sum the records with R? -
R-bloggers
New features in imager 0.20 imager, an R package for image
processing, has been updated to
v0.20 on CRAN. Its a major
upgrade with a lot of new
features, better documentation
and a m...
2 ore fa
Labels: package, plot, tag cloud, text filtering, text mining, visualization
56 commenti:
Noam Lpez 30 luglio 2011 06:44
ben fry
This here is a ghost town - This
blog was created in 2008 and
hasnt been actively updated for
several years. For more recent
work, please visit the Fathom
site, where you can see cur...
1 anno fa
TeachStreet
Find online
and local
Statistics
Lessons
The post is interesting and I could replicate your second example. But I don't know how
to do it if I have a text in a txt file or a word file. Your example just works with a html
table but very often we have whole texts. I will be very grateful if you can make a world
cloud using a txt file.
FeedBurner FeedCount
Noam
Visitors
Rispondi
Paolo 30 luglio 2011 07:35
Dear Noam,
You can find both the answer to your question and a nice introduction to text mining in R
in the vignette of the tm package:
install.packages("tm")
library("tm")
vignette("tm")
HIH!
Rispondi
Visualizzazioni totali
1046008
Who am I?
Paolo Sonego
Segui
Risposte
Anonimo 20 settembre 2013 01:23
completo
Science, Evolution,
and
Creationism ...
Thank you Paolo, I'm going to read about the tm package. This is my first meet with text
mining bacause I just use R for my classes of statistics.
Noam
Rispondi
Paolo 31 luglio 2011 19:23
tools that can be useful for solving both basic and more advanced problems in this
interesting field.
Rispondi
Get Google
Chrome
A faster way to
browse the
web. Stable,
free & installs in
seconds!
A great example. However sometime in the past two weeks back from 2011/11/30 the
directory
and
file
was
removed.
So
"http://cran.r-project.org/web/packages/available_packages_by_date.html" is not found
because the "packages" directory is no longer there. How about using another web site
as an example?
Rispondi
Paolo Sonego
Thanks for the update! Feel free to suggest a web site of interest as an alternative.
Rispondi
Thanks Paolo, this might be impossible but how about any text on a basic news web
page like www.washingtonpost.com. Ignore pictures and pick phrases/sentences based
on commas, periods and breaks "-". We could manually input a target web page and the
example would wordwrap the page.
Thanks, Jim
Rispondi
Paolo Sonego
Thanks Jim for your suggestion! I have updated the post in accordance with your advice
(more or less).
Rispondi
how does one increase the plotted area of the word cloud...by increasing the the
dimensions of the png image, I am only getting a bigger image..with most of it being
blank white space, and a small word cloud at the middle of the plot
Rispondi
Paolo Sonego
It seems that this problem (bug?) is related to the graphics device driver you decides to
use (pdf, svg, png, etc.). I think that Ian Fellows, the author of the package, could
answer your questions more appropriately than me!
Rispondi
you can add some really nice colors to your word cloud using the Free color brewer color
rules. download their spreadsheet to add the capability into your code. see
colorbrewer2.org
Rispondi
Paolo Sonego
Thanks for stepping in. The RColorBrewer package allows users to access the beautiful
and well conceived colorbrewer palettes from R, take a look at it!
Rispondi
Julian 30 marzo 2012 16:11
This sounds very promising. I didn't know that R let you create this kind of
visualizations. I used wordle in the past, and R for a bioinformatic project at the
University.
Rispondi
Thanks for the post..when i run the second example nothing happens...just checking if
the png wordcloud files should be in any particular folder
Thanks
Arun David
Rispondi
Paolo Sonego
Dear Arun,
I checked the code for the second example and it seems to work without a hitch. If you
have used the exact same code presented in the post, the image should be generated in
your working directory and named wordcloud_packages.png.
HIH!
Rispondi
Anonimo 16 maggio 2012 14:39
I am a beginner with R. How can I create a "phrase" cloud? Basically I have a list of 100
strings/phrases which i want to present as a cloud.
Rispondi
Paolo Sonego
I have difficulty understanding the goal of your exercise. A word cloud is a visual tool
which can help in perceiving the most prominent (frequent) terms in a collection of
words using either color or size. In your case I can imagine your phrases are all different
among them; therefore a representation based on frequency make little sense to me.
Rispondi
Risposte
Anonimo 16 maggio 2012 15:17
The list of phrases has been derived based on importance; hence it is in order
of importance. Is it possible to represent them like a cloud? How?
Maybe give the phrases descending frequencies?
More importantly, the examples above are for single words. What about
phrases?
Rispondi
Paolo Sonego
Rispondi
Risposte
Anonimo 16 maggio 2012 16:22
thanks a million!
Paolo Sonego
Is there a limit for the word cloud? Just a handful of items are visible. I set
frequency for every 10 items as the same; hence, 10 for first 10 items, 9 for
next 10 items...
1< arthik 3 settembre 2015 00:46
thank you!
Rispondi
Paolo Sonego
hi this is good !! but it depends on word frequency,i will try on sentiment wordcloud
along with frequency it means positive word show different color and negative one show
different color.can any one help me.
Rispondi
Hi Paolo, I was able to build word cloud from csv file with little modification of above
code. Would like to know if it is possible to create different shape like what tagxedo
provides?
Rispondi
Risposte
Dhaval Mistry 9 gennaio 2014 10:02
Can you help me with how you loaded the csv file. I am newbie in R and
getting stuck in it
Rispondi
Paolo Sonego
Dear Hari, from what I can see from the help page this feature is not currently available
in the wordcloud package. You could suggest it to the author.
Rispondi
FYI
Hello Mr.Ian,
I am using the package Word Cloud authored by you in R. The package creates a word
cloud in a circle shape, would like to know if it is possible to make different shapes of
word cloud like what tagxedo provides.
Rispondi
Paolo Sonego
Dear Hari, in order to contact the author of the wordcloud package you should use the
information
you
can
find
at
http://cran.r-project.org/web/packages/wordcloud/index.html
I am not related to the author of this package nor have any connection with him.
Rispondi
Anonimo 15 marzo 2013 16:03
Great post!! Super useful!
Rispondi
Paolo Sonego
I checked the code again and, with R 2.15.2 (on both Windows and Linux) and a recent
version of the loaded packages, everything works as expected. Two suggestions: 1)
check all the packages are installed and loaded. 2) Check your internet connection and
firewall settings.
HIH
Rispondi
Hi, I am using the latest version of R. I copy pasted your code in R but I am not getting
any graph/World Cloud. I have installed all the required packages. Can you please help?
Rispondi
Paolo Sonego
Dear Ved,
Of course I can only guess but, from my experience, if both the installation of the
different required packages and the sourced code didn't throw any error, this mean that
the word cloud image was produced and saved in your workspace directory with the
name wordcloud.png.
HIH
Rispondi
Pradeepta Mishra 29 agosto 2013 09:52
Hi,
The second example gives me a picture with black background as "Invalid Image". Pleas
suggest how to get the image correctly for the word cloud. I am using the same example
as mentioned.
Thanks
Rispondi
Risposte
Paolo Sonego
Can you help me with how you loaded the csv file. I am newbie in R and getting stuck in
it
Rispondi
Paolo Sonego
Dear Dhaval, csv importing is a very common starting point you have to do when you
are going to use R or any other programming language for analyzing data. I suggest you
to take a look at any introductory R book/tutorial you can find (see the R/CRAN website
for tens of choices). Furthermore if you are still stuck at some point, you can get a lot of
useful responses on the StackOverflow Q & A website (use the [r] tag).
Rispondi
Risposte
Anvi Modi 5 agosto 2014 16:00
Hi,
Can you help me to remove duplicates from one column using the tm_map (TM
package) and corpus method?
Thanks,
Anvi
Rispondi
???
Error en .overlap(x1, y1, sw1, sh1, boxes) :
el paquete 'dataptr' no ofrece la funcin 'Rcpp'
> dev.off()
null device
por que me sale esto?
donde esta mi error
mi
Platform: x86_64-w64-mingw32/x64 (64-bit)
Rispondi
Paolo Sonego
Hi Paolo,
I was trying to form the wordcloud for NYTimes community comments.
After I type,
recent.news <- community(what=what, key=my.key)
It gives me an error,saying : Error: Forbidden
What could be the reason beind this error? I have successfully obtained the API Key for
NYTimes Community API.
Rispondi
Risposte
Paolo Sonego
@Paicyclopedia I did manually some calls to the NYTimes API and they seen to
work properly. Not 100% sure but it could be that the RNYTimes package is no
more capable to use the API: you could ask the authors of the package to take
a look at their code and see if it is really a problem related to the way the
current API should be call from a wrapper.
Rispondi
I got the same error as Paicyclopedia. I think the problem is related to the security level
of your computer.
Rispondi
Risposte
Paolo Sonego
Another install alternative for the NYTimes package (Win8.1, Rstudio 3.1.1)
install.packages("RNYTimes",repos="http://www.omegahat.org/R",
type="source",
dependencies=TRUE)
Rispondi
Risposte
Paolo Sonego
Paolo Sonego
Dear Sayan,
Thanks for the useful feedback! As you can see from the timestamp this is a very old
post which took advantage of a old version of the tm package. A fix which work with a
current version of tm (version 0.6 here) could be: adding the below line
Risposte
Paolo Sonego
Commenta come:
Pubblica
Post pi recente
Seleziona profilo...
Anteprima
Home page
Post pi vecchio