DA Project Report
DA Project Report
DA Project Report
Submitted by:
Kirti Reddy(BEA_39)
Charul Joshi(BEA_40)
Danesh Bastani(BEA_48)
Abstract 3
Introduction 3
Important Terminologies 4
Libraries used 5
Requirement Specification 6
Implementation 12
Conclusion 17
Abstract
Sentiment analysis over Twitter offers organisations a fast and effective way to monitor the
publics’ feelings towards their brand, business, directors, etc. A wide range of features and
methods for training sentiment classifiers for Twitter datasets have been researched in recent
years with varying results. In this report, I have implemented the Twitter Sentiment analysis using
R language and some packages. The name of the packages are syuzhet, twitterR, tm, etc.
These packages are used to produce the sentiment behind the tweets that are fetched from twitter
using the twitter API.
Introduction
The emergence of social media has given web users a venue for expressing and sharing
their thoughts and opinions on all kinds of topics and events. Twitter, with nearly 600 million
users and over 250 million messages per day, has quickly become a gold mine for organisations
to monitor their reputation and brands by extracting and analysing the sentiment of the Tweets
posted by the public about them, their markets, and competitors. Sentiment analysis over Twitter
data and other similar microblogs face several new challenges due to the typical short length and
irregular structure of such content. Two main research directions can be identified in the
literature of sentiment analysis on microblogs. The first direction is concerned with finding new
methods to run such analysis, such as performing sentiment label propagation on Twitter
follower graphs and employing social relations for user-level sentiment analysis. The second
direction is focused on identifying new sets of features to add to the trained model for sentiment
identification, such as microblogging features including hashtags, emoticons the presence of
intensifiers such as all-caps and character repetitions etc., and sentiment-
topic features.
Important Terminologies
When working with text mining applications, we often hear of the term “stop words” or “stop word
list” or even “stop list”. Stop words are basically a set of commonly used words in any language, not just
English. The reason why stop words are critical to many applications is that, if we remove the words that
are very commonly used in a given language, we can focus on the important words instead.
Stop words are generally thought to be a “single set of words”. It really can mean different things to
different applications. For example, in some applications removing all stop words right from determiners
(e.g. the, a, an) to prepositions (e.g. above, across, before) to some adjectives (e.g. good, nice) can be an
appropriate stop word list. To some applications, however, this can be detrimental. For instance, in
sentiment analysis removing adjective terms such as ‘good’ and ‘nice’ as well as negations such as ‘not’
can throw algorithms off their tracks. In such cases, one can choose to use a minimal stop list consisting
of just determiners or determiners with prepositions or just coordinating conjunctions depending on the
needs of the application.
Word clouds (also known as text clouds or tag clouds) work in a simple way: the more a specific
word appears in a source of textual data (such as a speech, blog post, or database), the bigger and bolder it
appears in the word cloud.
Text mining also referred to as text data mining, roughly equivalent to text analytics, is the process
of deriving high-quality information from text. High-quality information is typically derived through the
devising of patterns and trends through means such as statistical pattern learning. Text mining usually
involves the process of structuring the input text (usually parsing, along with the addition of some derived
linguistic features and the removal of others, and subsequent insertion into a database), deriving patterns
within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text
mining usually refers to some combination of relevance, novelty, and interest. Typical text mining tasks
include text categorization, text clustering, concept/entity extraction, production of granular taxonomies,
sentiment analysis, document summarization, and entity relation modelling (i.e., learning relations
between named entities).
2 Libraries used
2.1 twitteR
twitteR is an R package which provides access to the Twitter API. Most functionality of
the API is supported, with a bias towards API calls that are more useful in data analysis as opposed to
daily interaction.
2.2 tm
A framework for text mining applications within R.
2.3 Syuzhet
This vignette demonstrates the use of the basic functions of the Syuzhet package.
The package comes with four sentiment dictionaries and provides a method for accessing the
robust, but computationally expensive, sentiment extraction tool developed in the NLP group at
Stanford. Use of this later method requires that you have already installed the coreNLP package
The goal of this vignette is to introduce the main functions in the package so that you can quickly
extract plot and sentiment data from your own text files. This document will use a short example
passage to demonstrate the functions and the various ways that the extracted data can be returned
and or visualized.
2.4 Wordcloud
Functionality to create pretty word clouds, visualize differences and similarity
between documents, and avoid over-plotting in scatter plots with text.
3 Requirement Specification
(b) RStudio
Note: Twitter Developers Account is also required for performing this analysis.
4 Twitter Developer Account:
Twitter now manually approves all developer access request to API Keys.
Given the highly political nature of our global society and the high number of spammers working
our economy, who can blame them? In a world where botnets can be created overnight, social media
corporations are discovering they have to be more careful in how they allow their platforms to be
automated.
Manual applications, of course, slow things down. They also can make or break a person’s
ambitions. Students may not be able to begin (or complete) projects on time. SAAS (Software as a
service) companies may not be able to move forward with their commercial projects. Individuals might
not be able to create their novelty bots. With the judge and jury sitting on the other side, apprehension can
set in.
The Twitter developer portal is a set of self-serve tools that developers can use to manage their access to
the premium APIs, as well as to create and manage their Twitter apps.
The portal is made up of the following pages:
● A developer dashboard that displays Premium API usage and subscription level.
● A subscriptions page where you can manage and view additional details about your Premium
subscription level.
● An apps page where you can create and manage your Twitter Apps.
● An environments page where you can set up your developer environments.
● A billing page where you can view your payment details and previous invoices.
● and a teams page where you can add and manage the different handles that have access to your
team's Premium APIs.
4.1 Steps for creating a twitter developer account
1. Visit https://developer.twitter.com
2. Click on Apply and choose the reason for using developer account tools.
3. Give some personal details.
4. Give the details of twitter how you are planning to use the twitter data fetched from API
4.2 Creation of App for getting API keys and tokens
1. Navigate to My Applications.
2. Since I already have this app created, it appears on my page. Click on “Create New
App”.
3. Fill in all the details in the application.
4. Once all the details are filled in and verified you will be granted the customer and access
keys.
5 Implementation
install.packages("twitteR")
install.packages("RCurl")
install.packages("base64enc")
install.packages("httr")
install.packages("tm")
install.packages("wordcloud")
library(twitteR)
library(RCurl)
library(base64enc)
library(httr)
library(tm)
library(wordcloud)
consumer_key <-"ABCDEFGHIJKLMNOPQRSTUVWXYZ123467890"
consumer_secret <- "ABCDEFGHIJKLMNOPQRSTUVWXYZ123467890"
access_token <-"ABCDEFGHIJKLMNOPQRSTUVWXYZ123467890"
access_secret <- "ABCDEFGHIJKLMNOPQRSTUVWXYZ123467890"
5.6 Using get_nrc_sentiment() and getting the sentiments from the words of the tweets
5.7 Only Fetching Positive, Negative and Neutral Tweets
Text Processing and Sentiment analysis emerges as a challenging field with lots of obstacles as it involves
natural language processing. It has a wide variety of applications that could benefit from its results, such as
news analytics, marketing, question answering, readers do. Getting important insights from opinions expressed
on the internet especially from social media blogs is vital for many companies and institutions, whether it is in
terms of product feedback, public mood, or investors opinions.
Sentiment analysis is a difficult technology to get right. However, when you do, the benefits are great.
Look for a tool that has uses Natural Language Processing technology and ideally with machine learning
capabilities. Look for a. vendor that treats sentiment analysis seriously and shows advancements and updates in
their sentiment analysis technology