Sentiment Analysis For Vietnamese: Binh Thanh Kieu Son Bao Pham
Sentiment Analysis For Vietnamese: Binh Thanh Kieu Son Bao Pham
Sentiment Analysis For Vietnamese: Binh Thanh Kieu Son Bao Pham
Abstract — Sentiment analysis is one of the most important Finally, section 5 will give concluding remarks and
tasks in Natural Language Processing. Research in sentiment pointers to future work.
analysis for Vietnamese is relatively new and most of current
work only focus in document level. In this paper, we address II. RELATED WORK
this problem at the sentence level and build a rule-based For the last decade, sentiment mining has become a hot
system using the Gate framework. Experimental results on a subject among natural language processing (NLP) and
corpus of computer products reviews are very promising. To
information retrieval (IR) researchers [9]. Though the
the best of our knowledge, this is the first work that analyzes
works on sentiment mining all have different focuses,
sentiment at sentence level in Vietnamese.
emphasizes and objectives; nevertheless, they generally
Keywords - Sentiment Analysis, Opinion Mining, Text consists of the following three steps: sentiment words or
Mining phrases identification, sentiment orientation identification
and sentiment sentence or document classification.
Sentiment words or phrases identification focuses on
I. INTRODUCTION content words (nouns, verbs, adjectives and adverbs)
In recent years, along with the rapid growth of the where most of the works use part-of-speech (POS) to
Internet, textual information on the web is becoming larger extract them [4][8][11][16]. Other natural language
and larger. Generally, textual information is often processing techniques such as stop words removal,
classified into two main types: facts and opinions. Most stemming and fuzzy matching are also used in the
current information processing techniques (search engines) preprocessing stage to extract sentiment words and
works with facts. Facts can be expressed with topic phrases.
keywords. However, search engines do not search for In the work about sentiment orientation identification,
opinions. An example for this kind of information is the there are many approaches proposed. Hu and Liu [8]
product reviews. This information can be collected from applied POS tagging and some natural language processing
manufacturers or users. Manufacturers use opinions for techniques to extract the adjectives as sentiment words.
building business strategy. A sentiment analysis system Experimental result of their opinion sentence extraction
about product’s quality is expected to meet the need of has a precision of 64.2% and a recall of 69.3%. Fellbaum
both the users and the manufacturers. [5] uses WordNet to determine whether the extracted
Technically, each sentiment analysis system can often adjective has a positive or negative polarity. The pointwise
be divided into two parts: identifying words and phrases mutual information (PMI) is used by Church and Hanks
that hold opinions and classifying sentence or document [2] and Turney [15] to measure the strength of semantic
according to the opinions. Unlike the classification by association between two words. Nasukawa and Yi [11]
types or subject, the classification by sentiment requires also consider verbs as sentiment expressions for their
the understanding of the emotional trend in the article. sentiment analysis. They use HMM-based POS tagger [10]
Some challenging aspects in sentiment analysis include the and rule-based shallow parsing [12] for preprocessing.
identification of opinion terms, the intensities of sentiment, They then analyze the syntactic dependencies among the
the complexity of sentences, words in different contexts phrases and look for phrases with a sentiment term that
and sentiment classification for the complex articles etc. modifies or is modified by a subject term.
In this paper, we propose a rule-based method for The task of sentence or document sentiment
constructing automatic evaluation of users’ opinion at classification is to classify a sentence or document
sentence level. Using a rule-based approach is a natural according to its polarity into different sentiment categories
choice since there is no publicly available corpus for – positive or negative with neutral category added
Vietnamese sentiment analysis. Our system is built on sometimes. Hu and Liu [8] predict the orientation of the
GATE [1] - a framework for developing components of opinion sentence in their study of customer reviews.
natural language processing. Our system focuses on the Turney [16] used a simple unsupervised algorithm to
domain of computer products (laptop & desktop). classify reviews in different domains as recommended or
We will present related work on sentiment analysis in not recommended and then do sentiment words (phrases)
section 2 and describe our system in section 3. Section 4 extraction based on Hatzivassiloglou and McKeown’s [7]
will show some experimental results and error analysis. approach and orientation identification based on Turney’s
153
1. Dictionaries containing names related to features In sentiment word recognition step (an example in
recognition: Figure 2), sentiment words are determined based on
a. Dictionary of words related to configuration dictionaries but there are many cases where simply
features of computer products such as: cấu hình matching dictionaries without considering the context
(configuration), hệ thống (system), vi xử lý gives a wrong result. For example "thời trang" (fashion) is
(CPU) etc. a sentiment word in the sentence “Phong cách rất thời
b. Dictionary of words related to “kiểu dáng” trang” (very fashionable style) but not a sentiment word
(appearance) feature: kiểu dáng (appearance), in the sentence “Thiết kế của máy có nét thời trang giống
thiết kế (design), thân hình (body), kích thước với chiếc xe ô tô” (The fashion feature of this laptop is
(size), màu sắc (color) etc. similar to that of a car). There are also cases where a word
2. Dictionaries containing words used to develop can bring both positive and negative sentiment depending
rules to identify features’ sentiment: on context. For example, the word "cao" (high) is positive
a. Positive word dictionary: tốt (good), tuyệt vời if it talks about computer configuration but is negative
(excellent), hoàn hảo (perfect), hài lòng when talking about price.
(satisfying) etc. Contextually, it is easy to notice that sentiment words
b. Negative word dictionary: xấu (ugly), đắt usually appear after some adverbs. For example, positive
(expensive), thô (rough), phàn nàn (complain), sentiment words (PosWord) go with “rất” (very), “siêu”,
thất vọng (disappointing) etc. “khá”, “cực”, “đáp ứng” while negative sentiment words
c. Reverse opinion word dictionary: không thể (NegWord) go with “dễ”, “hơi”, “gây”, “bị”. We use the
(cannot), không quá (not too) etc. following pattern to recognize sentiment words:
<StrongWord> + <Adv> + <word in sentiment
E. Rules
dictionaries> -> opinion word
There are four types of rules: When user uses multiple sentiment words for
1. Dictionaries lookup words correction. describing a features such as in the following example:
2. Sentiment word recognition. “Laptop cho doanh nhân Acer Aspire 3935 sử dụng
3. Sentential sentiment classification thiết kế phá cách, hiện đại.”
4. Features evaluation “Acer Aspire 3935 laptops for business use an
We use Gate’s Jape grammar to specify our rules. A innovative and modern design”
Jape grammar allows one to specify regular expression We use the following pattern:
patterns over semantic annotations. Bellows is an example <Opinion word> (<conjunction: , và (and) hay (or)
of a JAPE rule to recognize one type of positive words: …> <Opinion word>)*
Rule: rulePositive1 Another important scenario is when users use words
that reverse the sentiment of the following statement. We
Priority: 1 simply use the following rule to handle this case:
<Reverse Opinion> < positive word (negative
( word)> -> < negative word (positive word)>
In addition, we also create other rules based on POS
(StrongWord) tags using unit testing to ensure consistency between new
rules and the data already correctly identified by existing
({Word.category=="O"})? rules.
The sentiment sentence classification step consists of
({Lookup.majorType=="positive"}) :name two main subtasks:
) x Simple sentence (or clauses) split
-->:name.PosWordFirst = {kind = "StrongWord + x Sentiment sentence classification: PosSen
<O>? +<PosWord>", type="Positive", rule = "Positive (positive sentence), NegSen (negative sentence),
recognition"} MixSen (mixed sentence) and CompSen
In the first step, we remove monosyllables appearing (comparison sentence).
in dictionaries but are not words and do not carry the Compound sentences may contain more than one
correct meaning in context. For example: clause discussing several features of a product. The simple
“Macbook Pro MB471ZPA có giá quá cao. Tuy nhiên sentence split step is to identify compound sentences and
chiếc Laptop này vẫn được đánh giá cao.” split them into separate simple sentences. We create rules
“Macbook Pro MB471ZPA has a too high price. to determine simple sentences using connective words.
However, this Laptop is still strongly recommended.” After this step, all sentences are considered simple and
Because our dictionaries include the word "giá" to talk about only one feature per sentence.
refer to the feature "giá" (price) of products so it would be For sentence classification, there are 4 main types:
incorrect to identify "giá" in the word "đánh giá" positive sentence, negative sentence, mixed sentence and
(recommend) as a feature "giá". This could simply be comparison sentence [6]. Positive sentences (PosSen) are
fixed by overwriting the result of word segmentation over assumed to include only positive words (PosWord).
dictionaries lookup. Negative sentences (NegSen) are assumed to include only
negative words (NegWord). And mixed sentences
154
(MixSen) contain both positive and negative sentiment d
words. Among sentences not containing any sentiment Neg
76.23 60.78 68.51
words, we identify sentences containing comparison Wor 153 122 93
% % %
d
expressions and label them as CompSen. With
comparison sentences, because the sentences often 85.86 72.07 78.97
All 598 502 431
% % %
compare one product with another product, we assume the
target product of the document is always mentioned first
and the nature of the comparison corresponds to the Table 2 - Result of sentiment word recognition on test data
sentiment. In particular, if it is a better or worse
comparison then it is of positive or negative sentiment #Syste
respectively. In effect, CompSen sentences will be #True F-
#Anno m Preci Recal
annota meas
converted to PosSen and NegSen where appropriate. tation Annot
tion
sion l
ure
Overall features evaluation is based on the result of ation
simple sentence classification. For positive and negative
Pos
sentences, it is quite straightforward as we only have to Wor 300 237 214
90.30 71.33 79.70
identify the feature mentioned in the sentence and deem % % %
d
the sentiment of sentence to be the sentiment of the Neg
67.74 70.00 68.85
feature. For mixed sentences, we use an assumption that Wor 60 62 42
% % %
they normally have the following format <Feature> d
<Opinion> <Feature> <Opinion>. Therefore we 85.71 71.27 77.83
All 362 301 258
associate each sentiment with the nearest preceding % % %
feature.
Feature evaluation simply counts how many positive B. Experiment for sentential sentiment classification
and negative sentences containing the feature and output
the ratio between the number of positive and negative
sentences. This ratio captures how users think about the At the sentence level, we evaluate the system on the
feature. task of labeling PosSen, NegSen and MixSen annotations.
Table 3 and Table 4 show the F-measures of the system
IV. EXPERIMENTS for recognizing these three annotations on training and
We collected a corpus of computer products reviews test data respectively.
and feedbacks and manually annotated all the data using
the annotations described in section 3.1. The corpus
Table 3 - Result of sentential sentiment classification on training data
consists of 3971 sentences in 20 documents corresponding
#True F-
to 20 products. We divided the corpus into 2 parts: the #Anno #Anno
annotati
Preci Recal
measu
training set and test set. The training set contains 16 tation tation
on
sion l
re
documents (3182 sentences), which is used to create Pos 70.64 66.67 68.60
231 218 154
dictionaries and rules for identifying all the annotations. Sen % % %
The test set contains 4 documents and it is used to test the Neg 69.79 69.07 69.43
97 96 67
performance of our rule-based system. Sen % % %
Mix 26.92 77.78 40.00
We run the experiments at three levels: word, sentence Sen
9 26 7
% % %
and features. For word and sentence level evaluation, we 67.35 67.94 67.64
All 340 343 231
just compare the annotation at corresponding levels posted % % %
by the system with the manually created annotation in the
test data. Table 4 - Result of sentential sentiment classification on test data
A. Experiment for sentiment word recognition #Syste
#True F-
#Annot m Preci Recal
At the word level, we evaluate how well the system can ation Annot
annotati
sion l
measu
identify PosWord and NegWord from the test data using on re
ation
the standard Precision, Recall and F-measure measures. PosS
157 157 99
63.06 63.06 63.06
Table 1 and Table 2 show the results of the system running en % % %
on training data and test data respectively. It appears that Neg 75.56 69.39 72.34
49 45 34
Sen % % %
the rule-based system generalizes quite well for sentiment Mix 14.29 60.00 23.08
word recognition task, as the F-measure on the test data is 5 21 3
Sen % % %
comparable to training data. 61.16 64.62 62.84
All 212 224 137
% % %
Table 1 – Result of sentiment word recognition on training data
155
more difficult to recognize compared to PosSen and subjective, it is indicative of the effectiveness and
NegSen. potential of our system.
In the future, we plan to collect a larger data set with
C. Features Evaluation
more diverse domains and combine our system with
For every product, we evaluate the performance of the machine learning approaches.
system on each feature of the product. In this experiment,
we are going to evaluate five features: “vận hành” ACKNOWLEDGEMENT
(operation), “cấu hình” (configuration), “màn hình” This work is partly supported by the research project
(monitor), “giá” (price), and “kiểu dáng” (appearance). No. QG.10.39 granted by Vietnam National University,
The output of the system for each feature is the ration a/b Hanoi and the IBM Faculty Award 2009 for the second
where a and b are the number of positive and negative author.
sentences mentioning the feature respectively. For
example 15/10 means 15 positive sentences discuss the
feature and 10 negative sentences talk about the feature.
We define the following measure for a feature: REFERENCES
Degree of positive sentiment = (number of PosSen) / [1] H. Cunningham, D. Maynard, K. Bontcheva, V. Tablan. 2002.
“GATE, A Framework and Graphical Development Environment
(number of PosSen + number of NegSen) for Robust NLP Tools and Applications”. Proceedings of the 40th
Deviation = | System’s degree of positive sentiment – Anniversary Meeting of the Association for Computational
correct degree of positive sentiment | Linguistics (ACL'02). Philadelphia, July 2002.
Correctness = (1 - Deviation)*100% [2] K. W. Church, P. Hanks. 1989. “Word association norms, mutual
The correctness for a product is the averaged value of information and lexicography”. Proceedings of the 27th Annual
Meeting of the Association for Computational Linguistics.1989,
the correctness measure of the product’s features. Vancouver, B.C., Canada, pp76–83.
Table 5 and Table 6 show the correctness of system [3] D. Day, C. McHenry, R. Kozierok, L. Riek. 2004. “Callisto: A
when analyzing sentiments for some products on training Configurable Annotation Workbench”. In Proceedings of the
data and test data respectively. Fourth International Conference on Language Resources and
Evaluation. (LREC 2004). ELRA. May, 2004.
Table 5 – Result of features evaluation on training data [4] X. Ding, B. Liu, L. Zhang. 2009. “Entity Discovery and
Product Correctness Assignment for Opinion Mining Applications”. Proceedings of the
15th ACM SIGKDD international conference on Knowledge
Acer Aspire 3935 92.83% discovery and data mining.
Apple Macbook Air 84.26% [5] C. Fellbaum. 1998. “ WordNet: an electronic lexical database”.
MB543ZPA MIT Press.
Acer Aspire AS4736 96.11% [6] M. Ganapathibhotla and B. Liu. 2008. “Mining Opinions in
All 91.07% Comparative Sentences”. Proceedings of the 22nd International
Conference on Computational Linguistics.
[7] V. Hatzivassiloglou and Kathleen R. McKeown. 1997.
Table 6 - Result of features evaluation on test data “ Predicting the Semantic Orientation of Adjectives”. Proceedings
of the 8th conference on European chapter of the Association for
Product Correctness Computational Linguis- tics. 1997, Madrid, Spain.
Dell Inspiron 1210 84.32 % [8] M. Hu and B. Liu. 2004. “Mining and summarizing customer
Compaq Presario CQ40 89.99% reviews”. Proceedings of the 10th ACM SIGKDD international
conference on Knowledge discovery and data mining. Aug. 22–
HP Pavilion dv3 92.11% 25, 2004, Seattle, WA, USA.
All 88.81% [9] A. Kao and Stephen R. Poteet. “Natural Language Processing and
text mining”. April 2006. Chapter 2.
Even though the system’s performance on sentence [10] C. Manning and H. Schutze. 1999. “Foundations of Statistical
level is not very high, but looking at the product as a whole Natural Language Processing”. MIT Press, Cambridge, MA.
it is quite reasonable with the averaged correctness of [11] T. Nasukawa and J. Yi. 2003. “Sentiment Analysis: Capturing
nearly 90%. Favorability Using Natural Language Processing”. Proceedings of
the 2nd international conference on Knowledge Capture.
V. CONCLUSION [12] Mary S. Neff, Roy J. Byrd, and Branimir K. Boguraev. 2003.
“The Talent System: TEXTRACT Architecture and Data Model”.
We have built a rule-based sentiment analysis system Proceedings of the HLT-NAACL2003 Workshop on Software
for Vietnamese computer product reviews at sentence Engineering and Architecture of Language .
level. Our system looks at features of a product and output [13] B. Pang, L. Lee and S. Vaithyanathan. 2002. “Thumbs up?
the ratio of the number of positive and negative Sentiment classification using machine learning techniques”.
Proceedings of the 7th Conference on Empirical Methods in
sentiments towards every feature. To the best of our Natural Language Processing (EMNLP-02).
knowledge, this is the pioneering work for Vietnamese [14] D. Duc Pham, G. Binh Tran, Son Bao Pham. 2009. “A Hybrid
sentiment analysis at sentential level. Approach to Vietnamese Word Segmentation using Part of Speech
Even though the system achieves F-measures of tags”. International Conference on Knowledge and Systems
around 77% and 63% for word and sentence levels Engineering.
respectively, the overall result for a product is of 89% [15] P. Turney. 2001. “Mining the Web for synonyms: PMI-IR versus
LSA on TOEFL”. Proceedings of the 12th European Conference
correctness. While the measure used for evaluating on Machine Learning. Berlin: Spinger-Verlag, pp. 491–502.
performance of the system on the product level is [16] P. Turney. 2002. “Thumbs up or thumbs down? Semantic
orientation applied to unsupervised classification of reviews”.
156
Proceedings of the 40th Annual Meeting of the Association for [17] http://tinvadung.vn
Computational Linguistics (ACL-02). Jun. 2002, Philadelphia,
PN, USA, pp.417–424.
157