Sentiment KCAP03 Final

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/220916772

Sentiment analysis: Capturing favorability using natural language processing

Conference Paper · January 2003


DOI: 10.1145/945645.945658 · Source: DBLP

CITATIONS READS
596 2,198

2 authors:

Tetsuya Nasukawa Jeonghee Yi


IBM Facebook
41 PUBLICATIONS   1,868 CITATIONS    20 PUBLICATIONS   1,483 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Cross-cultural online review analysis View project

All content following this page was uploaded by Jeonghee Yi on 10 March 2015.

The user has requested enhancement of the downloaded file.


Sentiment Analysis
--Capturing favorability using Natural Language Processing--
Tetsuya Nasukawa Jeonghee Yi
IBM Research, Tokyo Research Laboratory IBM Research, Almaden Research Center
1623-14 Shimotsuruma, Yamato-shi, 650 Harry Rd, San Jose,
Kanagawa-ken, 242-8502, Japan CA, 95120, USA
nasukawa@jp.ibm.com jeonghee@almaden.ibm.com

ABSTRACT For example, enormous sums are being spent on


This paper illustrates a sentiment analysis approach to customer satisfaction surveys and their analysis. Yet, the
extract sentiments associated with polarities of positive or effectiveness of such surveys is usually very limited in
negative for specific subjects from a document, instead of spite of the amount of money and effort spent on them,
classifying the whole document into positive or negative. both because of the sample size limitations and the
difficulties of making effective questionnaires. Thus there
The essential issues in sentiment analysis are to identify
is a natural desire to detect and analyze favorability within
how sentiments are expressed in texts and whether the
online documents such as Web pages, chat rooms, and
expressions indicate positive (favorable) or negative
news articles, instead of making special surveys with
(unfavorable) opinions toward the subject. In order to
questionnaires. Humans can easily recognize natural
improve the accuracy of the sentiment analysis, it is
opinions among such online documents. In addition, it
important to properly identify the semantic relationships
might be crucial to monitor such online documents, since
between the sentiment expressions and the subject. By
they sometimes influence public opinion, and negative
applying semantic analysis with a syntactic parser and
rumors circulating in online documents may cause critical
sentiment lexicon, our prototype system achieved high
problems for some organizations.
precision (75-95%, depending on the data) in finding
sentiments within Web pages and news articles. However, analysis of favorable and unfavorable
opinions is a task requiring high intelligence and deep
Categories and Subject Descriptors understanding of the textual context, drawing on common
I.2.7 Natural Language Processing – Text analysis. sense and domain knowledge as well as linguistic
H.3.1 Content Analysis and Indexing–Linguistic knowledge. The interpretation of opinions can be debatable
processing. even for humans. For example, when we tried to determine
if each specific document was on balance favorable or
Keywords unfavorable toward a subject after reading an entire group
sentiment analysis, favorability analysis, text mining, of such documents, we often found it difficult to reach a
information extraction. consensus, even for very small groups of evaluators.
Therefore, we focused on finding local statements on
sentiments rather than analyzing opinions on overall
INTRODUCTION favorability. The existence of statements expressing
A technique to detect favorable and unfavorable opinions sentiments is more reliable compared to the overall opinion.
toward specific subjects (such as organizations and their For example,
products) within large numbers of documents offers Product A is good but expensive.
enormous opportunities for various applications. It would contains two statements. We think it's easy to agree that
provide powerful functionality for competitive analysis, there is one statement,
marketing analysis, and detection of unfavorable rumors
for risk management. Product A is good,
that indicates a favorable sentiment, and there is another
statement,
Permission to make digital or hard copies of all or part of this work for Product A is expensive,
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that that indicates an unfavorable sentiment. Thus, instead of
copies bear this notice and the full citation on the first page. To copy analyzing the favorability of the whole context, we try to
otherwise, or republish, to post on servers or to redistribute to lists,
extract each statement on favorability, and present them to
requires prior specific permission and/or a fee.
K-CAP’03, October 23-25, 2003, Sanibel Island, FL, USA. the end users so that they can use the results according to
Copyright 2003 ACM 1-58113-000-0/00/0000…$5.00 their application requirements.
In this paper, we discuss issues of sentiment analysis the document subject. For example, the classification of a
in consideration of related work and define the scope of our movie review into positive or negative [2,13] assumes that
sentiment analysis in the next section. Then we present our all sentiment expressions in the review represent sentiments
approach, followed by experimental results. We also directly toward that movie, and expressions that violate this
introduce applications based on our sentiment analysis. assumption (such as a negative comment about an actor
even though the movie as a whole is considered to be
excellent) confuse the judgment of the classification. On
SENTIMENT ANALYSIS the contrary, by analyzing the relationships between
The essential issue in sentiment analysis is to identify how sentiment expressions and subjects, we can make in-depth
sentiments are expressed in texts and whether the analyses on what is favored and what is not.
expressions indicate positive (favorable) or negative In this paper, we define the task of our sentiment analysis
(unfavorable) opinions toward the subject. Thus, sentiment as to find sentiment expressions for a given subject and
analysis involves identification of determine the polarity of the sentiments. In other words, it
• Sentiment expressions, is to identify text fragments that denote a sentiment about a
subject within documents rather than classifying each
• Polarity and strength of the expressions, and document as positive or negative towards the subject. In
• Their relationship to the subject. this task, the identification of semantic relationships
These elements are interrelated. For example, in the between subjects and sentiment-related expressions is a key
sentence, “XXX beats YYY”, the expression “beats” denotes issue because the polarity of the sentiment may be entirely
a positive sentiment toward XXX and a negative sentiment different depending on the relationships, as in the above
toward YYY. example of “XXX beats YYY.” In our current
However, most of the related work on sentiment implementation, we manually built the sentiment lexicon
analysis to date [1-2,4-5,7,11-14] has focused on based on the requirements discussed in the next section.
identification of sentiment expressions and their polarities.
Specifically, the focus items include the following:
FRAMEWORK OF SENTIMENT
• Features of expressions to be used for sentiment
analysis such as collocations [12,14] and adjectives [5] ANALYSIS
• Acquisition of sentiment expressions and their Definition of Sentiment Expressions
polarities from supervised corpora, in which Besides adjectives, other content words such as nouns,
favorability in each document is explicitly assigned adverbs, and verbs are also used to express sentiments. In
manually, such as five stars in reviews [2], and principle, a sentiment expression using an adjective, say
unsupervised corpora, in which no clue on sentiment “good”, denotes the sentiment towards its modifiee noun
polarity is available except for the textual content [4], such as in “good product,” and the whole noun phrase
including the WWW [13] (“good product”) itself becomes a sentiment expression
In all of this work, the level of natural language processing with the same polarity as the sentiment adjective (positive
(NLP) was shallow. Except for stemming and analysis of for “good” in this case). Likewise, a sentiment expression
part of speech (POS), they simply analyze co-occurrences using an adverb, say “beautifully,” denotes the sentiment
of expressions within a short distance [7,12] or patterns [1] towards its modifiee verb such as in “play beautifully,” and
that are typically used for information extraction [3,10] to the polarity of the sentiment is inherited by the modifiee
analyze the relationships among expressions. Analysis of verb. Thus, sentiment expressions using adjectives, adverbs,
relationships based on distance obviously has limitations. and nouns can be simply defined as either positive or
For example, even when a subject term and a sentiment negative in terms of polarity. In contrast, as in the examples
term are contained in the same sentence and located very in the previous section such as “XXX beats YYY,” the
close to each other, the subject term and the sentiment term polarity of sentiments denoted by the sentiment expressions
may not be related at all, as in in verbs may depend on the relationships with their
Although XXX is terrible, YYY is in fact excellent, arguments. In this case, positive sentiment is directed
towards its subject and negative sentiment is directed
where "YYY" is not "terrible" at all. towards its object. In addition, some verbs do not denote
One major reason for the lack of focus on relationships sentiment by themselves, but only transfer sentiments
between sentiment expressions and subjects may be due to among their arguments. For example, a be-verb transmits
their applications. Many of their applications aim to the sentiment of its complement to its subject such as in
classify the whole document into positive or negative “XXX is good,” in which the positive sentiment of its
toward a subject of the document that is specified either complement, “good,” is transferred to its subject, “XXX.”
explicitly or implicitly [1-2,11-13], and the subject of all of Thus, we classified sentiment related verbs into two types,
the sentiment expressions are assumed to be the same as namely,
• Sentiment verbs that direct either positive or negative tVB prevent obj ~sub
sentiment toward their arguments,
indicates that verb "prevent" passes the opposite of the
• Sentiment transfer verbs that transmit sentiments (un)favorability of its object to its target subject term if the
among their arguments, object noun phrase contains (un)favorability and the target
and associate them with arguments such as subjects and term is in its subject, such as in,
objects that inherit or provide sentiment. "XXX prevents trouble."
Therefore, we have manually defined sentiment in which "XXX" is a subject term receiving favorable
expressions in a sentiment lexicon by using a simple sentiment, and "trouble" is a sentiment term for
notation that consists of the following information: unfavorability.
• Polarity For terms with other POS, we simply classify them
positive (good), negative (bad), or neutral is denoted into favorable, unfavorable, and neutral. For example,
by g, b, or n, respectively, and sentiment transfer verbs
are denoted by t. bJJ crude
• Part of speech (POS) indicates adjective (denoted by JJ) “crude” has unfavorable
Currently, adjective (JJ), adverb (RB), noun (NN), sentiment (denoted by “b” in the first column), and
and verb (VB) are registered in our lexicon
nNN crude oil
• Sentiment term in canonical form
indicates that noun phrase (denoted by NN) “crude oil” is
• Arguments such as subject (sub) and object (obj) that neutral (denoted by “n” in the first column) so that the term
receive sentiment from a sentiment verb or arguments “crude” in “crude oil” is not treated as a negative sentiment.
that provide sentiment to and receive sentiment from a Thus, sentiment terms can be compound words, and they
sentiment transfer verb are applied using the leftmost longest match method so that
For example, the following notation longer terms with more matching elements are favored. In
gVB admire obj addition, we also allowed the use of regular expressions for
the flexibility of expressions such as
indicates that the verb "admire" is a sentiment term that
indicates favorability towards a noun phrase in its object bVB put \S+ at risk sub,
when the noun phrase in the object contains a subject term. in which “\S+” can be matched with one or more sequences
Likewise, of non-whitespace characters, and a sentence such as
bVB accuse obj “XXX put communities at risk.”
indicates that the verb "accuse" is a sentiment term that is considered to be negative for XXX.
indicates unfavorability against a noun phrase in its object
when the noun phrase contains a subject term. In principle, we tried to define the framework of the
sentiment lexicon as simply as possible, both to ease the
bVB fail sub manual work and for the sake of simplifying automatic
indicates that the verb "fail" is a sentiment term that generation in the future. As we deal with natural language,
conveys unfavorability towards a noun phrase in its subject we may find exceptional cases in which sentiments defined
when the noun phrase contains a target subject term. in the lexicon do not hold. For example, “put something at
risk” may be favorable when the “something” is
tVB provide obj sub unfavorable such as the case of “hackers.” Thus, we started
indicates that verb "provide" passes the (un)favorability of with basic entries that cover most of the cases properly and
its object into its target subject term if the object noun dealt with exceptional cases by adding entries that deal
phrase contains (un)favorability and the target term is in its with more specific terms to be applied properly in those
subject, such as in, specific cases.

"XXX provides a good working environment." Currently, we have 3,513 entries in the sentiment
analysis dictionary as shown in Table 1. Among these
"XXX provides a bad working environment." entries, regular expressions were used in 14 cases.
where "XXX" is a subject term with favorable and
unfavorable sentiment, provided that "a good working
environment" and "a bad working environment" are
favorable and unfavorable, respectively.
Finally,
After a POS for each word was assigned, we used
POS Total positive negative neutral shallow parsing in order to identify phrase boundaries and
local dependencies, typically binding subjects and objects
adjective 2,465 969 1,495 1 to predicates. This shallow parsing is based on the
adverb 6 1 4 1 application of a cascaded set of rules, successively
identifying more and more complex phrasal groups. Thus
noun 576 179 388 9 simple patterns can find simple noun groups and verb
Sentiment groups, and these can be composed into a variety of
357 103 252 2
verb complex NP configurations. At a yet higher level, clause
Transfer boundaries can be marked, and even (nominal) arguments
109
verb for (verb) predicates can be identified. These POS tagging
Table 1. Distribution of sentiment terms and shallow parsing functionalities have been implemented
using the Talent System based on the TEXTRACT
architecture [8].
Algorithm After obtaining the results of the shallow parser, we
We applied sentiment analysis to text fragments that analyze the syntactic dependencies among the phrases and
consist of a sentence containing a subject term and the rest look for phrases with a sentiment term that modifies or is
of the following paragraph. The window always included at modified by a subject term. When the sentiment term is a
least 5 words before and 5 words after the target subject. verb, we identify the sentiment according to its definition
There is an upper limit of 50 words before and 50 words in the sentiment dictionary. Syntactic subjects in passive
after. Thus, the task of our sentiment analysis approach is sentences are treated as objects for matching argument
to find sentiment expressions that are semantically related information in the definition. Finally, a sentiment polarity
to the subject term within the text fragment, and the of either +1 (positive = favorable) or -1 (negative =
polarity of the sentiment. The size of this text fragment was unfavorable) is assigned to the sentiment according to the
defined tentatively based on our preliminary analysis to definition in the dictionary unless negative expressions
capture the minimal required context around the subject such as “not” and “never” are associated with sentiment
term. expressions. When the negative expressions are associated,
In order to identify sentiment expressions and analyze we reverse the polarity. As a result,
their semantic relationships with the subject term, natural • The polarity of the sentiments,
language processing plays an important role. POS tagging • The sentiment expressions that are applied, and
allows us to disambiguate some polysemous expressions • The phrases that contain the sentiment expressions, are
such as “like,” which denotes sentiment only when used as identified for a given subject term.
a verb instead of as an adjective or preposition. Syntactic The following examples were output from our current
parsing allows us to identify relationships between prototype system, as applied to genuine texts from the
sentiment expressions and the subject term. Furthermore, in WWW. In each input, we underlined the subject term that
order to maintain robustness for noisy texts from various our system targeted for analysis. Each output starts with an
sources such as the WWW, we decided to use a shallow indicator of sentiment polarity toward the subject. The
parsing framework that identifies phrase boundaries and subject term and sentiment terms identified in the input are
their local dependencies in addition to POS tagging, instead connected with “---” with their representation in canonical
of using a full parser that tries to identify the complete forms that are associated with the whole phrase in the
dependency structure among all of the terms. parenthesis that contains them. When transfer verbs are
For POS tagging, we used a Markov-model-based used, information on the transfer verbs appears in the
tagger essentially the same as the one described in [6]. This middle of the representation of the subject term and
tagger assigns a part of speech to text tokens based on the sentiment term. Among the following examples, Example 3
distribution probabilities of candidate POS labels for each contains negation, and Example 4 is a passive sentence. All
word and the probability of a POS transition extracted from of the typographic errors in the following examples,
a training corpus. We used a manually annotated corpus of including the ones in the next section, came from the
Wall Street Journal articles from the Penn Treebank Project original texts, and they were usually handled properly by
[9] as the training corpus. For these experiments, the tagger our shallow parser.
was configured to treat unknown words (i.e. those not seen
in the training corpus, and excluding numbers) as nouns. Example 1:
The tagger uses a lexical look-up component, which offers <input> (subject="MDX")
sophisticated inflectional analysis for all known words. For 2002, the MDX features the same comfort and
exhilaration, with an even quieter ride.
<output> of correct cases that the system assigned compared to the
base of all cases where a human analyst associated either
+1 MDX (the MDX)---feature (features)---comfort
positive or negative sentiments manually. In other words,
(the same comfort and exhilaration)
precision and recall are calculated with the following
Example 2: formulas:
<input> (subject="IBM") A = number of all cases that the system assigned either a
positive or negative sentiment
Of the seven stakeholder groups, IBM received the
B = number of all cases that the human assigned either a
highest score in the ranking for its policies and programs
positive or negative sentiment
for minorities and women.
C = number of correct cases in the system output based
<output> on the manual judgment
+1 IBM (IBM)---receive (received)---high score (the Precision = C/A
highest score in the ranking) Recall = C/B
Example 3:
Evaluation with Benchmark Corpus
<input> (subject="canon") In order to evaluate the quality of the sentiment analysis,
Image quality was 1 and the Canon G2 definately did not we created a benchmark corpus that consists of 175 cases
disappoint me! (sic.) of subject terms within contexts extracted from Web pages
from various domains. Each case was manually identified
<output> to represent either a favorable or an unfavorable sentiment
+1 canon (the Canon G2 definately)---disappoint (did toward the subject. There were 118 favorable cases and 58
not disappoint) unfavorable cases. The examples in the previous section
were taken from this corpus.
Example 4: After modifying the dictionary for the benchmark
<input> (subject="Range Rover") corpus by adding appropriate terms, our current prototype
system achieved 94.3% precision and 28.6% recall as it
They haven't, even though the Range Rover was extracted sentiments for 53 cases (50 correct).
celebrated as a status symbol as long ago as the 1992
movie The Player. Evaluation with Open Test Corpus
<output> In order to verify the quality for practical use, we used the
prototype for a new test set with 2,000 cases related to
+1 celebrate (was celebrated)---Range Rover (SUB the camera reviews, also from Web pages. This time, about
Range Rover) half of the cases contained either favorable or unfavorable
Example 5: sentiments and the other half were neutral. Our system
extracted sentiments for 255 cases, and 241 of them were
<input> (subject="Ford Explorer") correct in terms of the polarity of either negative or positive
For example, the popular Ford Explorer retains about toward its subject within the context. Thus, without any
75 percent of its sticker price after three years, while the modification of the dictionary, the current prototype system
high-end Lincoln Continental retains only about half of achieved 94.5% (=241/255) precision with about 24%
its original cost after the same amount of time. (=241/1,000) recall.

<output> Analysis of Failures


+1 popular---Ford Explorer (the popular Ford Explorer) In the open test corpus of camera reviews, our system
failed to judge the correct sentiment in cases similar to the
following:
Example 6:
EXPERIMENTAL RESULTS
We have applied this sentiment analysis method to data in a <input> (subject="picture")
number of domains, and evaluated the results manually by It's difficult to take a bad picture with this camera.
using a benchmark corpus and other open test data. For the <output>
evaluations, we checked if the polarity of the sentiment was
-1 bad---picture (a bad picture)
appropriately assigned to the given subject in each input in
terms of the sentiment expression in the output, and This is a positive statement for the camera, and it's not
calculated the precision and recall. Precision is the ratio of relevant to extract this "bad picture" as a negative
correct cases within the system outputs. Recall is the ratio sentiment.
context that negates the local sentiment for the whole, and
Example 7: they are not due to failures of our syntactic parser. Thus, in
order to improve precision, we can restrict the output of
<input> (subject="canon") ambiguous cases that tend to be negated by predicates in
The g2 is the daddy of all slr type cams for users that higher levels. For example, sentiments in noun phrases
dont make their money of photographing and probably a (NPs) as in Examples 5 and 6 can easily be negated by the
good choise for them who do to all tests ive done and predicates that they are attached to, so we might consider
seen shows that the Canon cameras are the best as suppressing the extraction of NP-type sentiments. In
objection to one of the negative reviews saying canon addition, sentiments in a sentence that contains an if-clause,
sucks In my oppinion it beats all fuji nikon minolta sony as in the following example, are highly ambiguous, as are
and other brand competitors. (sic.) the sentiments in interrogative sentences.
<output> Example 10:
-1 canon (canon)---suck (sucks) <input> (subject="AALIYAH")
This may be considered as correct in a sense that it
If AALIYAH was so good, why she is grammyless. Do
indicates existence of some negative reviews. However, the
you like her? Do they know it? Do you like them?
whole context is positive toward "canon", and we don’t
think this sentiment is what the author intended, so we <output>
counted it as incorrect. In this example, our system also +1 AALIYAH (AALIYAH)---be (was so)---good
generated the following output for the previous “Canon” (good)
within the same sentence. Thus, by suppressing the output of ambiguous sentiments,
+1 canon (the Canon cameras)---be (are)---best (the we can improve the precision fairly easily. In fact, we have
best) observed that we could achieve 99% precision in a data set
in the pharmaceutical domain by adding enough entries and
eliminating the ambiguous sentiments of the NP-type, since
Example 8:
most of the failures were NP-type cases in that data.
<input> (subject="battery") However, improvement in precision damages recall and it
Also the battery went dead while at Animal Kingdom and is also important to improve the recall as well as the
one feature I used to like about the Olympus is that if the precision by handling such ambiguous cases properly. By
recharge-able batteries went dead you could just pop eliminating the ambiguous sentiments, in the benchmark
some AA's in and still get your pictures. corpus, the precision was improved from 94.3% to 95.5%,
but the recall was reduced from 28.6% to 24%, as it
<output>
extracted sentiments for 44 cases (42 correct) in
-1 battery (the battery)---go (went)---dead (dead) comparison to 53 cases (50 correct) with the ambiguous
Here the incident that "the battery went dead" is described ones.
as a normal event instead of product failure. In order to investigate the possibility of improving the
recall, we analyzed 122 cases in the benchmark corpus for
which our system failed to extract any sentiments. In 14
Example 9: (11.5%) of these cases, the subject terms and the sentiment
<input> (subject="sony") expressions did not appear in the same sentence. Anaphora
Although I though the SONY pictures were typically resolution may solve half of these 14 cases by associating
great I had a lot of pictures that were 1 out of focus 2 anaphoric expressions such as pronouns with their subject
very grainy - especially in low light and 3 Not real crisp terms, since the anaphoric expressions appeared in the
all the time. (sic.) same sentences with the sentiment expressions. In the
remaining 108 (88.5%) cases, the subject terms and
<output> sentiment expressions appeared in the same sentence. In
+1 sony (the SONY pictures)---be (were typically) --- most of these cases, the sentences were quite long and
great (great) contained nested sub-clauses, embedded sentences and
In this context, the positive sentiment "the SONY pictures phrases, or complex parallel structures. Since it is quite
were typically great" is an expectation of the author, and difficult for a shallow parser to make appropriate analyses
the fact it didn't happen disappointed the author and for such cases, the failures in these cases are due to either
resulted in negative sentiment for the entire context. limitations or failures of the shallow parser.
As in the real examples such as Example 7, there are
As seen in Examples 6 through 9, most of the failures are quite a few typographic errors and ill-formed sentences in
due to the complex structures of the sentences in the input the Web pages. Thus, in order to maintain robustness for
those cases, we decided to continue using a shallow parser
instead of a full parser. Yet based on the result that failures because it provides very short summaries of the sentiment
in syntactic analysis did not damage the precision, it might expressions. For such applications, precision in the polarity
make sense to adopt a full parser and make deeper NLP is considered to be more important than recall so that users
analysis, such as anaphora resolution, in order to improve don’t have to verify the results by reading the original
the recall for longer and more complicated sentences. documents.
In order to verify the credibility of trends detected by the
system output in spite of its low recall, we compared the
APPLICATIONS ratio of favorability in the detected sentiments with the
In evaluating our system with real-world applications, we missed sentiments by using the data on camera reviews
have applied it to about a half million Web pages and a from Web pages. We asked a human evaluator to pick up
quarter million news articles. positive and negative sentiments for brand A and brand B
First, we extracted sentiments on an organization by from 10,000 cases within the open test corpus. As shown in
defining thirteen subject terms that typically represent the Table 2, the recall of sentiment detection by our system for
organization, including its full name, its short name, former both brands was about 12%, and the ratio of favorability in
names, and its divisional names. Out of 552,586 Web system output was comparable to the human evaluation.
pages, 6,415 pages were classified as mentioning the
organization after checking for other terms in the same Table 2. Comparison of sentiments on camera brands
pages. These 6,415 pages contained 16,862 subject detected by human and system
references (2.6 references per page). Among them, 369
references were associated with either positive or negative Polarity Brand A Brand B
sentiments by our prototype, and the precision was 86%.
Human positive 212 (94%) 144 (78%)
We also scanned for the same organization in 230,079
news articles. Among these, 1,618 articles were classified evaluation negative 13 (6%) 41 (22%)
as mentioning the organization, and they contained 5,600 positive 24 (92%) 19 (83%)
System
subject references (3.5 references per article). A total of
output negative 2 (8%) 4 (17%)
142 references were associated with either positive or
negative sentiments, and 88% of them were correct in
terms of precision.
We also extracted sentiments about product names. This
Finding important documents to be monitored
For some areas of analysis where data tends to be sparse, it
time, we chose a pharmaceutical domain, and the subjects
is difficult to find relevant documents, and human analysts
were the names of ten medicines. Out of 476,126 Web
are willing to read the content of the document that the
pages, 1,198 pages were classified as mentioning one of the
sentiment analysis approach identified as having sentiments.
medicines, and there were 3,804 subject references (3.2
For example, opinions on corporate images are generally
references per page). Our prototype system associated 103
harder to find compared to opinions on products (whose
references with either positive or negative sentiments, and
comparisons may be found on various consumer Web sites),
91% of them were correct in terms of precision.
and analysts of corporate images may want to read through
Based on these results, we feel that our approach allows the relevant Web pages.
us to collect meaningful sentiments from billions of Web
For this type of application, recall is more important than
pages with relatively high precision. In the following
the precision in the polarity, and a recall around 20% for
subsections, we introduce two typical applications that can
finding these documents may be too low. However,
take advantage of our sentiment analysis approach in spite
according to our experience, a document that contains a
of its relatively low recall, and we discuss important issues
sentiment expression usually contains quite a few
for these applications.
sentiments, as they express multiple sentiments from
Capturing Trends on Sentiments various viewpoints or for various subjects to make
By comparing the sentiments on specific subjects between comparison. Thus, even though the recall of finding a
uniform intervals we can detect opinion trends. By particular sentiment using our approach is around 20% or
comparing sentiments for specific subjects with other less, the chances of finding important documents tend to be
subjects, we can do a competitive analysis. For example, high enough.
we can do a quantitative analysis by counting the numbers
of positive and negative sentiments to see if a subject is on
balance favorable or unfavorable. It may be useful to CONCLUSION AND FUTURE WORK
analyze changes in the balance over some period of time We have illustrated a sentiment analysis approach for
and to compare it with other subjects. The output of our extracting sentiments associated with polarity of positive or
method also allows us to do qualitative analysis easily negative for specific subjects from a document, instead of
classifying the whole document as positive or negative. In
order to achieve high precision, we focused on identifying Byrd, Mary Neff, Bran Bograev, Herb Chong, and Jim
semantic relationships between sentiment expressions and Cooper for the use of their POS tagger and shallow parser as
subject terms. Since sentiments can be expressed with well as its Java interface, and Jasmine Novak, Zengyan
various expressions including indirect expressions that Zhang, and David Smith for their collaboration and advice
require common sense reasoning to be recognized as a on this work. We would also like to thank the anonymous
sentiment, it’s been a challenge to demonstrate the reviewers for their comments and suggestions, and Shannon
feasibility of our simple framework of sentiment analysis. Jacobs for invaluable help in proofreading early versions of
Yet our experimental results indicate we can actually this paper.
extract useful information on sentiments from most of the
texts with our current implementation. REFERENCES
The initial experiments resulted in about 95% precision [1] Chinatsu Aone, Mila Ramos-Santacruz, and William J.
and roughly 20% recall. However, as we expand the Niehaus. AssentorR: An NLP-Based Solution to E-
domains and data types, we are observing some difficult mail Monitoring. In Proceedings of AAAI/IAAI 2000,
data for which the precision may go down to about 75%. pages 945-950. 2000
Interestingly, that data usually contains well-written texts [2] Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan.
such as news articles and descriptions in some official Thumbs up? Sentiment Classification using Machine
organizational Web pages. Since those texts often contain Learning Techniques. In Proceedings of the
long and complex sentences, our simple framework finds Conference on Empirical Methods in Natural
them difficult to deal with. Language Processing (EMNLP), pages 79-86. 2002.
As seen in the examples, most of the failures are due to [3] Ralph Grishman and Beth Sundheim. Message
the complex structures of sentences in the input context understanding conference - 6: A brief history. In
that negates the local sentiment for the whole, and they are Proceedings of the 16th International Conference on
not due to failures of our syntactic parser. For example, a Computational Linguistics, pages 466-471. 1996.
complex sentence such as “It's not that it's a bad camera”
[4] Vasileios Hatzivassiloglou and Kathleen R. McKeown.
confuses our method. It is noteworthy that failures in
Predicting the semantic orientation of adjectives. In
parsing sentences do not damage the precision in our
Proceedings of the 35th Annual Meeting of the ACL
approach. In addition, it allows us to classify ambiguous
and the 8th Conference of the European Chapter of the
cases by identifying features in sentences such as inclusion
ACL, pages 174-181. 1997.
of if-clauses and the interrogatives. Thus, we can maximize
the precision by eliminating such ambiguous cases for [5] Vasileios Hatzivassiloglou and Janyce M. Wiebe.
applications that prefer precision rather than recall. Effects of adjective orientation and gradability on
sentence subjectivity. In Proceedings of 18th
Because of our focus on precision, the recall of our
International Conference on Computational
approach remains low. However, it’s still effective for
Linguistics (COLING), pages 299-305. 2000.
various applications. Trend analysis and important
document identification in terms of sentiments are typical [6] Chris Manning and Hinrich Schutze. Foundations of
examples that can take advantage of our approach. Statistical Natural Language Processing. MIT Press,
Cambridge, MA. 1999.
Our current system requires manual development of
sentiment lexicons, and we need to modify and add [7] Satoshi Morinaga, Kenji Yamanishi, Kenji Tateishi,
sentiment terms for new domains. Although our current Toshikazu Fukushima. Mining Product Reputations on
domain-dependent dictionaries remain relatively small, the Web. In Proceedings of KDD-2002. 2002.
with fewer than 100 entries each for five different domains, [8] Mary S. Neff, Roy J. Byrd, and Branimir K. Boguraev.
dictionary maintenance would be an important issue for The Talent System: TEXTRACT Architecture and
large-scale applications. Thus, we are working toward Data Model. In Proceedings of the HLT-NAACL 2003
automated generation of the sentiment lexicons in order to Workshop on Software Engineering and Architecture
reduce human intervention in dictionary maintenance, both of Language Technology systems (SEALTS). pages 1-8.
for improving precision for new domains as well as for 2003.
improving the overall recall. [9] Penn Treebank Project.
In addition, for improvement of both precision and recall, http://www.cis.upenn.edu/treebank/
we are exploring the feasibility of integrating a full parser [10] SAIC Information Extraction.
and various discourse processing methods including http://www.itl.nist.gov/iaui/894.02/related_projects/muc/
anaphora resolution.
[11] Ellen Spertus. Smokey: Automatic recognition of
hostile messages. In Proceedings of the Conference on
ACKNOWLEDGMENTS Innovative Applications of Artificial Intelligence
We would like to thank Wayne Nieblack, Koichi Takeda,
(IAAI), pages 1058-1065. 1997
and Hideo Watanabe for overall support of this work, Roy
[12] Richard M. Tong. An operational system for detecting [14] Janyce M. Wiebe, Theresa Wilson, and Matthew Bell.
and tracking opinions in on-line discussions. Working Identifying collocations for recognizing opinions. In
Notes of the ACM SIGIR 2001 Workshop on Proceedings of the ACL/EACL Workshop on
Operational Text Classification, pages 1-6. 2001. Collocation. 2001.
[13] Peter Turney. Thumbs Up or Thumbs Down? [15] Jeonghee Yi and Tetsuya Nasukawa. Sentiment
Semantic Orientation Applied to Unsupervised Analyzer: Extracting Sentiments towards a Given
Classification of Reviews. In Proceedings of the 40th Topic using Natural Language Processing Techniques.
Annual Meeting of the Association for Computational In Proceedings of the Third IEEE International
Linguistics (ACL), pages 417-424, 2002. Conference on Data Mining (ICDM). (to appear).
2003.

View publication stats

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy