0% found this document useful (0 votes)
2 views9 pages

importante03

This paper presents an unsupervised method for Aspect Term Extraction (ATE) using a Bidirectional Long Short-Term Memory (B-LSTM) network combined with Conditional Random Fields (CRF). It addresses the limitations of supervised ATE, such as the small size of labeled datasets and the high cost of human annotation, by automatically generating labeled datasets from opinion texts like reviews. The proposed method achieves high precision and outperforms existing supervised baselines in ATE tasks, demonstrating its effectiveness in extracting aspect terms from unannotated data.

Uploaded by

Liliane Da Silva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views9 pages

importante03

This paper presents an unsupervised method for Aspect Term Extraction (ATE) using a Bidirectional Long Short-Term Memory (B-LSTM) network combined with Conditional Random Fields (CRF). It addresses the limitations of supervised ATE, such as the small size of labeled datasets and the high cost of human annotation, by automatically generating labeled datasets from opinion texts like reviews. The proposed method achieves high precision and outperforms existing supervised baselines in ATE tasks, demonstrating its effectiveness in extracting aspect terms from unannotated data.

Uploaded by

Liliane Da Silva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Unsupervised Aspect Term Extraction with B-LSTM & CRF using

Automatically Labelled Datasets

Athanasios Giannakopoulos, Claudiu Musat, Andreea Hossmann


and Michael Baeriswyl
Artificial Intelligence and Machine Learning Group — Swisscom AG
firstName.lastName@swisscom.com

Abstract Performing sentiment analysis to detect the over-


all polarity of a sentence or paragraph comes with
Aspect Term Extraction (ATE) identifies two major disadvantages. First, sentiment analy-
arXiv:1709.05094v1 [cs.CL] 15 Sep 2017

opinionated aspect terms in texts and is sis on sentence (or paragraph) level does not fulfill
one of the tasks in the SemEval As- the purpose of getting more accurate and precise
pect Based Sentiment Analysis (ABSA) information. The polarity refers to a broader con-
contest. The small amount of avail- text, instead of pinpointing specific targets. Sec-
able datasets for supervised ATE and the ondly, many sentences or paragraphs contain op-
costly human annotation for aspect term posing polarities towards distinct targets, making
labelling give rise to the need for unsu- it impossible to assign an accurate overall polar-
pervised ATE. In this paper, we introduce ity.
an architecture that achieves top-ranking The need for identifying aspect terms and their
performance for supervised ATE. More- respective polarity gave rise to the Aspect Based
over, it can be used efficiently as fea- Sentiment Analysis (ABSA), where the task is first
ture extractor and classifier for unsuper- to extract aspects or features of an entity (i.e. As-
vised ATE. Our second contribution is a pect Term Extraction or ATE1 ) from a given text,
method to automatically construct datasets and second to determine the sentiment polarity
for ATE. We train a classifier on our auto- (SP), if any, towards each aspect of that entity.
matically labelled datasets and evaluate it The importance of ABSA led to the creation of the
on the human annotated SemEval ABSA ABSA task in the SemEval2 contest in 2014 (Pon-
test sets. Compared to a strong rule-based tiki et al., 2014).
baseline, we obtain a dramatically higher Supervised ATE using human annotated datasets
F-score and attain precision values above leads to high performance for aspect term detec-
80%. Our unsupervised method beats the tion on unseen data, however it has two major
supervised ABSA baseline from SemEval, drawbacks. First, the size of the labelled datasets
while preserving high precision scores. is quite small, reducing the performance of the
classifiers. Second, human annotation is a very
1 Introduction
slow and costly procedure.
For many years now, companies are offering users The drawbacks of supervised ATE can be tackled
the possibility of adding reviews in the form of using unsupervised ATE. The size of the datasets
sentences or small paragraphs. Reviews can be can be significantly increased using targets from
beneficial for both customers and companies. On publicly available reviews (e.g. Amazon or Yelp).
the one hand, people can make better decisions by Reviews are opinion texts and contain plenty of
getting insights about available products and so- opinionated aspect terms, which makes them per-
lutions. One the other hand, companies are inter- fect candidates for constructing new datasets for
ested in understanding how and what customers ATE. With respect to the second drawback, an au-
think about their products, which helps in employ-
1
ing marketing solutions and correction strategies. Also known as Opinion Term Extraction (OTE).
2
The SemEval (Semantic Evaluation) contest is an ongo-
To this end, performing an automated analysis of ing series of evaluations of computational semantic analysis
the user opinions becomes a crucial issue. systems.
tomated data labelling process with high precision the provided training sets. Moreover, they ex-
can replace the slow and error-prone human anno- ploit external sources, such as the WordNet lex-
tation procedure. icographer files (Miller, 1995) and word clus-
We innovate by performing ATE starting from ters (e.g. Brown clusters (Turian et al., 2010)
opinion texts (e.g. reviews). This is a completely or K-means4 ). Toh and Su (2015) suggest using
unsupervised task since there are no labels to ex- gazetteers (Kazama and Torisawa, 2008) and word
plicitly pinpoint that certain tokens of the text are embeddings (Mikolov et al., 2013) for ATE. Toh
aspect terms. Reviews may contain labels (e.g. and Su (2016) use the probability output of an Re-
number of stars in a 1-5 star rating system) which current Neural Network (RNN) to further enrich
are related to their overall polarity. However, such the feature space.
labels are not useful for ATE. Independently of the feature extraction tech-
In this work, we present a classifier, which can be niques, supervised ATE is treated as a sequen-
used for feature extraction and aspect term detec- tial labelling task. Top-ranking participants in the
tion for both unsupervised and supervised ATE. SemEval ABSA contest use Conditional Random
We validate its suitability for ATE by achieving Fields (CRF) or Support Vector Machine (SVM)
top-ranking results for supervised ATE using the as sequential labelling classifiers (Toh and Wang,
SemEval-2014 ABSA task datasets3 . Then, we 2014; Toh and Su, 2015; Chernyshevich, 2014;
use it for unsupervised ATE. Brun et al., 2014).
Moreover, we contribute by introducing a new, There is also related work with respect to unsu-
completely automated, unsupervised and domain pervised ATE. Liu et al. (2015) exploit syntactic
independent method for annotating raw opin- rules to automatically detect aspect terms. (Garcia-
ion texts. Then, we use our classifier to per- Pablos et al., 2015; Garcia-Pablos and Rigau,
form unsupervised ATE by training it on the au- 2014) use a graph representation to describe the
tomatically labelled datasets obtained with our interactions between aspect terms and opinion
method. Against all expectations, our unsuper- words in raw text. Graph nodes are ranked using
vised method beats the supervised ABSA baseline PageRank and high-ranked nodes are used to cre-
from SemEval-2014, while achieving high preci- ate a set of aspect terms. Then, they use this set to
sion scores. The latter is very important for unsu- annotate unseen data by simply performing exact
pervised techniques since we wish to extract non- or lemma matching.
noisy aspect terms, i.e. minimize the number of Systems similar to (Hercig et al., 2016; Yin
false positives. et al., 2016; Soujanya et al., 2016) perform semi-
The rest of this paper is organized as follows. Sec- supervised ATE since they use human annotated
tion 2 presents the related work for ATE. Our ap- datasets for training but enrich their feature space
proach for supervised and unsupervised ATE is de- using features extracted by exploiting large un-
scribed in Sections 3 and 4 respectively. Section 5 labelled corpora. Pavlopoulos and Androutsopou-
presents our experiments and results for both su- los (2015) present a method for constructing new
pervised and unsupervised ATE. Finally, Section 6 datasets for ATE, however they use non-standard
focuses on our conclusions and future work. evaluation metrics. Finally, systems like (Garcia-
Pablos et al., 2017) focus on classifying the aspect
2 Related Work terms into categories. We do not compare against
such systems, since they do not perform the same
Research in the area of both supervised and un- task and are not equivalent to ours.
supervised ATE has flourished after the creation In all but one5 aforementioned cases, the evalua-
of the SemEval ABSA task in 2014. The win- tion of the model is performed using the F-score,
ners of the SemEval-2014 ABSA contest (Toh and as defined in (Pontiki et al., 2014). In case of unsu-
Wang, 2014) use supervised methods for ATE. pervised ATE, achieving higher precision is more
They extract features, similar to those used in important than higher recall as highlighted in (Liu
traditional Name Entity Recognition (NER) sys- et al., 2015).
tems (Tkachenko and Simanovsky, 2012) using
4
https://en.wikipedia.org/wiki/
3 K-means_clustering
The SemEval ABSA datasets contain human annotation
5
for ATE for both the laptop and the restaurant domains only Pavlopoulos and Androutsopoulos (2015) use a non-
in 2014. standard definition of precision and recall.
We perform both supervised and unsupervised of the sentence. The architecture is depicted in
ATE using a model that utilizes continuous word Fig. 1 and is inspired by the NER system presented
representations and performs feature extraction in (Yang et al., 2016). However, we employ LSTM
and sequential labelling simultaneously while cells and use word embeddings from fastText7 .
training. In case of supervised ATE, the training First B-LSTM layer: Randomly initialized char-
datasets are those of the SemEval ABSA task (hu- acter embeddings of each token are given as input
man annotated). In case of unsupervised ATE, we into the first B-LSTM layer, which aims at learn-
annotate raw opinion texts (e.g. reviews) with a ing new word embeddings. The first and second
completely automated and unsupervised process, directions (left → right and left ← right) of the
which we introduce. To the best of our knowledge, first B-LSTM layer are responsible for learning
we are the first to train a classifier using an auto- word embeddings by exploiting the prefix and the
matically labelled dataset and perform evaluation suffix of each token respectively.
on human annotated datasets. Second B-LSTM layer: For each token of a sen-
tence, we create a feature vector by combining
3 Supervised Aspect Term Extraction (i) the extracted word embeddings from the first
The ATE task can be modelled as a token-based B-LSTM layer and (ii) pre-trained word embed-
classification task, where labels are assigned to the dings using fastText. These feature vectors are
tokens of a sequence, depending on whether they given as input to the second B-LSTM layer, which
are aspect terms or not. For supervised ATE, we extracts a feature vector for each token by exploit-
apply a classification pipeline that consists of 3 ing the structure of the sentence. Similar to the
steps: (i) data preprocessing, (ii) model training first B-LSTM layer, the first and second directions
and (iii) model evaluation. are responsible for extracting features using the
The feature extraction is performed from a two- previous and the next tokens of each word.
layer bidirectional long short-term memory (B- CRF layer: The final layer uses the extracted
LSTM) network while the model is training, sim- feature vectors in order to perform sequential la-
ilar to the way a Convolutional Neural Network belling.
(CNN) extracts features while performing image 4 Unsupervised Aspect Term Extraction
classification. Therefore, we do not explicitly in-
clude this step in the aforementioned pipeline. The human annotation process — required to iden-
tify aspect terms in small sentences and construct
3.1 Data Preprocessing datasets for supervised ATE — comes at a high
We break down each sentence into tokens using cost, mainly for two reasons:
the spaCy parser6 and follow the traditional IOB 1. Human annotated datasets typically consist
format (short for Inside, Outside, Beginning) for of a few thousand sentences8 extracted from
sequential labelling. Tokens that represent the as- large corpora of domain-specific reviews.
pect terms of the sentence are labelled with B. In The small amount of data reduces the per-
case an aspect term consists of multiple tokens, the formance of classifiers.
first token receives the B label and the rest receive 2. Human annotation is very slow, costly and
the I label. Tokens that are not aspect terms are risky. Annotators may introduce noise in the
labelled with O. Given the sentence ”The internal datasets by labelling words incorrectly, ei-
speakers are amazing.” with target ”internal speak- ther because they are sloppy workers or be-
ers”, the labelling would be as follows: (The|O) cause they do not know exactly what aspect
(internal|B) (speakers|I) (are|O) (amazing|O) terms are. For example, given the sentence
(.|O). ”Works well, and I am extremely happy to
be back to an apple OS.”, human annotators
3.2 Classifier Architecture may consider the word ”works” as a target9 .
We employ a two-layer B-LSTM to extract fea- However, aspect terms are nouns and noun
tures for each token, which are used by a CRF for 7
https://github.com/facebookresearch/fastText
token-based classification. Features are created by 8
The datasets of the SemEval ABSA task consist of ap-
exploiting the word morphology and the structure proximately 3000 sentences for English.
9
Example taken from the golden annotated dataset for lap-
6
https://spacy.io/docs/ top reviews of the SemEval-2014 ABSA task.
Figure 1: Sequential labelling using B-LSTM & CRF classifier.

phrases (Liu et al., 2015), therefore the verb lighted in (Liu et al., 2015), where authors con-
”works” should not be considered as a target. struct syntactic rules primarily by focusing on this
We employ unsupervised ATE in order to over- criterion.
come both problems. We tackle the first problem Algorithm 1 describes the automated data la-
by using large datasets of opinion texts (e.g. re- belling method. First, we create a list of qual-
views). Such datasets are ideal for ATE since they ity phrases and prune it using a desired threshold
contain a plethora of opinionated aspect terms. value. Then, we iterate through all sentences and
In order to tackle the second problem, we in- annotate tokens that obey certain syntactic rules as
troduce and use an automated and unsupervised aspect terms. We repeat this procedure for multi-
method for labelling the tokens of the aforemen- word aspect terms and finally label the tokens us-
tioned datasets using the IOB format. We consider ing the IOB format.
only noun and noun phrases as candidate aspect
terms and focus on token labelling with high pre- Algorithm 1 Automated Data Labelling
cision in order to reduce falsely annotated aspect 1: qual phrases = run autophrase(corpus)
terms. In that way, we minimize the cost, the time 2: candidates = prune(qual phrases, qth )
and the mistakes introduced by the human annota- 3: for sentence in corpus do
tion process. 4: labels = []
We use the publicly available datasets of Amazon 5: for token in sentence do
and Yelp for laptop and restaurant reviews respec- 6: if token in candidates then
tively and perform some data cleaning such as re- 7: l = get label(token, rules, lexicon)
moving URLs from the reviews. 8: labels.append(l)
9: assign iob tags(sentence, labels)
4.1 Automated Data Labelling
Using raw opinion texts (e.g. reviews) for ATE
is a completely unsupervised task since there are 4.1.1 Quality Phrase List
no labels to explicitly pinpoint that certain tokens We start by building a sorted list of the form
of the text are aspect terms. Reviews frequently (quality phrase, q), where q ∈ [0, 1] represents
contain labels (e.g. number of stars in a 1-5 star the quality value of each phrase. The quality
rating system) related to their overall polarity but phrases — which we use as candidate aspect
these are not useful for ATE. terms — are n-grams that appear in the raw review
Our goal is to label each token of the unlabelled corpora and exceed a minimum support thresh-
opinion texts in an automated way using the IOB old10 . The list of quality phrases is built by apply-
format with unsupervised methods. While la- ing the AutoPhrase algorithm (Shang et al., 2017)
belling aspect terms, we focus on high precision, a on the review datasets for laptops and restaurants.
property that guarantees that the resulting datasets The quality of each phrase is determined via a
will contain as little noisy aspect terms as possi- 10
Support is an indication of how frequently the n-gram
ble. The importance of high precision is also high- appears in the dataset.
classification task with decision trees that takes ity phrases and detecting the quality value under
into account a list of high quality phrases using which a lot of domain-irrelevant candidate aspect
Wikipedia. The values of the features (e.g. tf-idf ) terms appear.
used in the decision trees to predict the quality of While the pruning step removes irrelevant phrases,
each phrase are more informative when the pro- as shown above, it also means that n-grams such
vided corpora are domain dependent. Therefore, as ”set up”, which are true aspect term candi-
we apply AutoPhrase on each dataset separately, dates are removed from the list due to low quality
rather than combining the two datasets. (qset up = 0.32). However, reducing noisy aspect
The extracted quality phrases, together with a set term candidates (e.g. ”couch” with q = 0.67) is
of syntactic rules, contribute in the automated data more important than keeping all aspect term candi-
labelling process, which is based on 3 pillars: dates since we wish to annotate aspect terms with
1. a sentiment lexicon high precision.
2. a pruned list of quality phrases We can make the data labelling method com-
3. syntactic rules able to capture aspect terms pletely automated by setting a fixed quality thresh-
Existing ATE systems (Garcia-Pablos et al., 2015), old qth for pruning the list of quality phrases. We
although unsupervised, exploit also syntactic rules highlight that a fixed threshold of qth = 0.7 leads
derived from supervised tools (e.g. parsers). to a good — but not optimal — trade-off between
Moreover, they require domain-dependent hu- high precision values and good F-score for ATE.
man input (e.g. seed words) to perform double-
4.1.4 Syntactic Rules for ATE
propagation. We avoid that by using a sentiment
lexicon. The pruned quality phrases and the sentiment lex-
icon are combined with syntactic rules in order to
4.1.2 Sentiment Lexicon extract aspect terms from sentences. Before ap-
In many cases, aspect terms have modifiers (e.g. plying any syntactic rule, we validate if a token
”This is a great screen”) or are objects of verbs is a potential aspect term by checking if it (i) is
(e.g. ”I love the screen of this laptop”) that ex- not a stopword, (ii) is present in the pruned qual-
press a sentiment. Therefore, we make use of a ity phrases and (iii) has a POS tag that is present
sentiment lexicon11 , which is necessary in order to in [NOUN, PRON, PROPN, ADJ, ADP, CONJ].
perform a look-up on whether modifiers and verbs Table 1 tabulates all rules used for ATE and gives
express a sentiment or not. examples of reviews with the respective extracted
aspect terms. For simplicity, we adopt a syntactic
4.1.3 Pruned Quality Phrases rule notation similar to the one used in (Liu et al.,
We prune our quality phrases since they contain 2015). The functions used in Table 1 have the fol-
both true and noisy aspect term candidates. More lowing interpretation:
concretely, we filter the list of quality phrases in • depends(d, ti , tj ) is true if the syntactic de-
order to keep n-grams with a quality above a cer- pendency between the tokens ti and tj is d.
tain threshold. • opinion word(ti ) is true if the token ti is in
We present an example to show the value of the the sentiment lexicon.
pruning step. The list of quality phrases ex- • mark target(ti ) means that we mark the to-
tracted using the Amazon review dataset on lap- ken ti as aspect term.
tops contains the 1-gram ”couch” and the 2-gram • is aspect(ti ) is true if the token ti is already
”touch pad” with quality 0.67 and 0.95 respec- marked as aspect term.
tively. However, the presence of the word ”couch”
as an aspect term in laptop reviews is completely 4.1.5 Language and Domain Adaptation
arbitrary. Therefore, we prune the list of qual- The automated data labelling method requires
ity phrases using an empirical quality threshold of adaptation in order to be used in different lan-
qth = 0.7 and qth = 0.6 for the laptop and restau- guages. More concretely, we need to adapt (i) the
rant domain respectively. We set these thresh- syntactic rules of Table 1, (ii) the sentiment lex-
olds manually after inspecting the lists of qual- icon and (iii) the tools required from Autophrase
11
(e.g. part-of-speech tagger) to the target language.
We use the sentiment lexicon of Bing Liu:
https://www.cs.uic.edu/˜liub/FBS/ We can use the automated data labelling method
sentiment-analysis.html for ATE dataset construction in a completely
Rules Example Extracted Targets
depends(dobj, ti , tj ) and opinion word(tj )
I like the screen screen
then mark target(ti )
depends(nsubj, ti , tj ) and depends(acomp, tk , tj )
The internal speakers are amazing internal speakers
and opinion word(tk ) then mark target(ti )
depends(nsubj, ti , tj ) and depends(advmod, tj , tj )
The touchpad works perfectly touchpad
and opinion word(tk ) then mark target(ti )
depends(pobj or dobj, ti , tj ) and depends(amod, tk , ti )
This laptop has great price price
and opinion word(tk ) then mark target(ti )
depends(cc or conj, ti , tj ) and is aspect(tj ) screen
Screen and speakers are awful
then mark target(ti ) speakers
depends(compound, ti , tj ) and is aspect(tj )
The wifi card is not good wifi card
then mark target(ti )

Table 1: Syntactic rules for aspect term extraction.

domain-independent fashion. To do so, we only


need to set the pruning threshold qth of the qual-
ity phrase list to a fixed value (Section 4.1.3). Our
experiments reveal that setting qth = 0.7 results
in a good trade-off between high precision and F-
score, independently of the laptop or the restaurant
domain.

4.2 Model Training and Evaluation


We train a B-LSTM & CRF classifier to perform
Figure 2: Results for supervised ATE using the B-
unsupervised ATE for both domains using the au-
LSTM & CRF architecture. We compare against
tomatically labelled datasets constructed in Sec-
the winners of the SemEval-2014 ABSA contest.
tion 4.1. The classifier is evaluated on the hu-
man annotated test datasets of the SemEval-2014
ABSA contest. the laptop and restaurant domain. Our classifier
uses the B-LSTM & CRF architecture presented
5 Experiments and Results in Fig. 1 and its implementation is based on (Der-
We perform experiments for supervised and un- noncourt et al., 2017).
supervised ATE in the laptop and the restau- We use a random 80-20% split on the original
rant domain and evaluate our classifier using the training set of SemEval-2014 ABSA contest in or-
CoNLL12 F-score. Compared to other super- der to create a new train and validation set. We
vised learning methods, we reach the performance keep the test set for our final evaluation. For
of SemEval-2014 ABSA winners in the restau- most of the parameters, we use the default values
rant domain. For laptops, our supervised sys- of (Dernoncourt et al., 2017). However, we use the
tem exceeds the best F-score of the SemEval-2014 adam optimizer with learning rate α = 0.01 and a
ABSA contest by approximately 3%. With re- batch size of 64. Moreover, we use the pre-trained
spect to unsupervised ATE, our technique achieves word embeddings of fastText.
(i) very high precision and (ii) an F-score that We train the classifier using the reduced train-
exceeds the supervised baseline of the SemEval ing set for a maximum number of 100 epochs.
ABSA. After each epoch, we evaluate our model using
the CoNLL F-score on the validation set. More-
5.1 Experiments for Supervised ATE over, we use early stopping with a patience of 20
epochs. This means that the experiment terminates
For supervised learning, we perform experiments
earlier if the CoNLL F-score of the validation set
using the human annotated training and test sets
does not improve for 20 consecutive epochs. At
provided by the SemEval-2014 ABSA contest for
the end of each experiment we choose the model
12
http://www.cnts.ua.ac.be/conll2003/ of the epoch that gives the best performance on the
validation set and make predictions on the test set. Labels: IOB Labels: OB
We repeat the aforementioned procedure for 50 ex- P F1 P F1
periments and present the experimental results for Rule-based 65.13 24.35 76.65 23.76
SVM 61.64 37.94 72.02 43.29

Laptops
both domains in Fig 2.
The F-score of the SemEval-2014 ABSA winners B-LSTM
66.67 42.09 74.51 44.37
is 74.55 and 84.01 for the laptop and the restaurant & CRF
domain respectively. The B-LSTM & CRF classi- SemEval 35.64
fier achieves an F-score of 77.96 ± 0.38 for lap- Rule-based 84.26 28.74 96.67 27.37

Restaurants
tops and an F-score of 84.12 ± 0.2 for restaurants SVM 67.28 48.08 80.83 57.36
with a confidence interval of 95%. With our per- B-LSTM
74.03 53.93 83.19 63.09
formance, we would have surely won in the lap- & CRF
top domain and probably also in the restaurant do- SemEval 47.15
main. Table 2: Experiments for unsupervised ATE. We
5.2 Experiments for Unsupervised ATE compare B-LSTM & CRF classifier against the
rule-based baseline, an SVM classifier and the
We also perform experiments for ATE with unsu- baseline of the SemEval-2014 ABSA contest.
pervised learning. For training, we use the auto-
matically labelled datasets (hereafter denoted as
ALD) obtained using the methodology described model does not use any machine learning algo-
in Section 4.1 with qth = 0.7 and qth = 0.6 for rithm. During the annotation process, a token of
the laptop and the restaurant domain respectively. the HLD is labelled as a target if (i) it belongs in
For testing, we use the human labelled datasets the pruned quality phrases list and (ii) satisfies at
(hereafter denoted as HLD) of the SemEval-2014 least one of the rules in Table 1. A comparison be-
ABSA task. tween the predicted and the golden labels of the
Our main goal is to evaluate our unsupervised HLD gives a quality estimation of the syntactic
technique on human annotated datasets. To the rules we create and acts as a baseline.
best of our knowledge, the largest available human
annotated datasets for ATE are provided by the Se- SVM
mEval ABSA task and contain laptop and restau-
rant reviews. Therefore, our analysis focuses only We train a linear SVM classifier13 in order to cre-
on these two domains. ate a second baseline model that uses machine
We start by creating a rule-based baseline model to learning. For SVM, we use the baseline features
make predictions for the HLD simply by applying presented in (Stratos and Collins, 2015) and build
techniques presented in Section 4.1. This baseline 1-0 feature vectors by exploiting the word mor-
(presented in the following section) does not rely phology and the sentence structure (i.e. adjacent
on any machine learning techniques for the anno- words of each token). The training and the evalua-
tation procedure. tion are done using the ALD and the HLD respec-
We aim at beating the rule-based baseline by using tively.
machine learning. To this end, we use the ALD In addition, we wish to show the trade-off between
to train our classifier. For unsupervised ATE, we precision and recall for different values of qth . We
run two types of experiments. The first one uses perform experiments for different values of qth
the traditional IOB labelling format and is stricter. and validate that the higher qth the higher the pre-
The second one is more relaxed and uses only B cision and the lower the recall. For example, an
and O labels (i.e. I labels are converted to B). The SVM classifier trained on an ALD with qth = 0.7
intuition is that aspect terms can be identified by achieves an F1 = 39.63 and P = 71.54 (Table 2
separating B and I labels from O. Therefore, I and shows results for qth = 0.6 for restaurants). We
B labels are treated equally against O. choose to keep qth = 0.6 for the restaurant domain
because we are interested in a good combination
Rule-based Baseline Model
of high precision and F-score.
The methodology described in Section 4.1 is used
in order to make predictions on the HLD for lap- 13
We use the implementation of LIBLINEAR (Fan et al.,
tops and restaurants, i.e. the rule-based baseline 2008).
B-LSTM & CRF
We employ the B-LSTM & CRF classifier using
the ALD as training set and the HLD as test set, i.e.
the evaluation is performed on the human anno-
tated datasets of SemEval-2014 ABSA task. In ad-
dition, we use the ABSA training sets of SemEval-
2014 as validation sets.
The maximum number of epochs and the patience
are set to 20 and 5 respectively. As stopping cri-
terion, we simply choose the epoch that achieves
the best F-score on the validation set. In all our
experiments, we compare the performance of the
B-LSTM & CRF classifier with the respective per-
formance of the rule-based baseline and the SVM
model. We do not report any confidence inter-
vals for the B-LSTM & CRF classifier because the
training time increases dramatically in the case of
unsupervised ATE due to the increased size of the
dataset. Conducting one experiment usually takes
more than 15h, which means that a round of at
least 20 experiments, that would allow for defin-
ing confidence intervals, would be computation-
ally intensive. For this reason, we leave the report
of confidence intervals for unsupervised ATE for Figure 3: F-score (top) and precision (bottom)
future work. However, we repeat up to 3 exper- comparison between B-LSTM & CRF and SVM
iments for each case and verify that the CoNLL for unsupervised ATE in the laptop domain. B, I
F-score and the precision are always higher com- and O labels are used.
pared to SVM. Results for the laptop domain can
be visualized in Fig. 3. We do not present any fig-
ures for the restaurant domain since the learning on the human annotated datasets of the SemEval-
curves are very similar to the ones of the laptop 2014 ABSA contest for the laptop and restau-
domain. rant domain. Moreover, we introduce a new, au-
We draw several conclusions by observing the re- tomated, unsupervised and domain independent
sults tabulated in Table 2. First, the B-LSTM method to label tokens of raw opinion texts as as-
& CRF classifier achieves higher F-score for pect terms with high precision. We use the auto-
both domains compared to the rule-based baseline matically labelled datasets to train the B-LSTM &
model and the SVM classifier. The F-score rel- CRF classifier, which we evaluate on human an-
ative improvement between the rule-based base- notated datasets. Against all odds, our unsuper-
line and the B-LSTM & CRF classifier is 73% vised method beats the supervised ABSA baseline
and 88% for the laptop and the restaurant do- F-score from SemEval, while preserving high pre-
main respectively. At the same time, we preserve cision scores.
high precision and attain values above 80%. Fi-
As future work, we plan to perform ATE for dif-
nally, our unsupervised method beats the super-
ferent domains (e.g. hotels) using our methods.
vised baseline F-score from SemEval ABSA.
Moreover, we plan to work towards adapting our
6 Conclusion and Future Work techniques to multilingual datasets (e.g. French,
Spanish, etc.). We would also investigate the idea
We present a B-LSTM & CRF classifier which we of exploiting the available ratings (e.g. 1-5 stars)
use for feature extraction and aspect term detec- of the review datasets in order to construct new
tion for both supervised and unsupervised ATE. datasets for ATE. This would allow us to perform
We validate this classifier by performing super- ATE with distant supervision.
vised ATE and achieving top-ranking performance
References Jingbo Shang, Jialu Liu, Meng Jiang, Xiang Ren,
Clare R. Voss, and Jiawei Han. 2017. Automated
Caroline Brun, Diana Nicoleta Popa, and Claude Roux. phrase mining from massive text corpora. CoRR
2014. Xrce: Hybrid classification for aspect-based abs/1702.04457.
sentiment analysis. In Proceedings of the 8th In-
ternational Workshop on Semantic Evaluation (Se- Poria Soujanya, Cambria Erik, and Gelbukh Alexander.
mEval 2014). 2016. Aspect extraction for opinion mining with
a deep convolutional neural network. Knowledge-
Maryna Chernyshevich. 2014. IHS R&D Belarus:
Based Systems .
cross-domain extraction of product features using
crf. In Proceedings of the 8th International Work- Karl Stratos and Michael Collins. 2015. Simple semi-
shop on Semantic Evaluation (SemEval 2014). supervised pos tagging .
Franck Dernoncourt, Ji Young Lee, and Peter Maksim Tkachenko and Andrey Simanovsky. 2012.
Szolovits. 2017. NeuroNER: an easy-to-use pro- Named entity recognition: Exploring features .
gram for named-entity recognition based on neural
networks . Zhiqiang Toh and Jian Su. 2015. NLANG: supervised
Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang- machine learning system for aspect category classi-
Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: fication and opinion target extraction. In Proceed-
A library for large linear classification. Journal of ings of the 9th International Workshop on Semantic
Machine Learning Research . Evaluation (SemEval 2015).

Aitor Garcia-Pablos, Montse Cuadros, and German Zhiqiang Toh and Jian Su. 2016. NLANG at semeval-
Rigau. 2015. V3: Unsupervised aspect based sen- 2016 task 5: Improving aspect based sentiment anal-
timent analysis for semeval-2015 task 12. ysis using neural network features. In Proceedings
of SemEval-2016.
Aitor Garcia-Pablos, Montse Cuadros, and German
Rigau. 2017. W2VLDA: Almost unsupervised sys- Zhiqiang Toh and Wenting Wang. 2014. DLIREC:
tem for aspect based sentiment analysis . aspect term extraction and term polarity classifica-
tion system. In Proceedings of the 8th International
Aitor Garcia-Pablos and German Rigau. 2014. Unsu- Workshop on Semantic Evaluation (SemEval 2014).
pervised acquisition of domain aspect terms for as-
pect based opinion mining. Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010.
Word representations: A simple and general method
Tomáš Hercig, Tomáš Brychcı́n, Lukáš Svoboda, for semi-supervised learning.
Michal Konkol, and Josef Steinberger. 2016. Unsu-
pervised methods to improve aspect-based sentiment Zhilin Yang, Ruslan Salakhutdinov, and William W.
analysis in Czech. Computación y Sistemas . Cohen. 2016. Multi-task cross-lingual sequence tag-
ging from scratch .
Junichi Kazama and Kentaro Torisawa. 2008. Induc-
ing gazetteers for named entity recognition by large- Yichun Yin, Furu Wei, Li Dong, Kaimeng Xu, Ming
scale clustering of dependency relations. Proceed- Zhang, and Ming Zhou. 2016. Unsupervised word
ings of ACL-08: HLT . and dependency path embeddings for aspect term
extraction .
Qian Liu, Zhiqiang Gao, Bing Liu, and Yuanlin Zhang.
2015. Automated rule selection for opinion target
extraction .
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Cor-
rado, and Jeffrey Dean. 2013. Distributed represen-
tations of words and phrases and their composition-
ality. In NIPS Proceedings.
George A. Miller. 1995. Wordnet: A lexical database
for english. In Communications of the ACM.
John Pavlopoulos and Ion Androutsopoulos. 2015. As-
pect term extraction for sentiment analysis: New
datasets, new evaluation measures and an improved
unsupervised method.
Maria Pontiki, Harris Papageorgiou, Dimitrios Gala-
nis, Ion Androutsopoulos, John Pavlopoulos, and
Suresh Manandhar. 2014. Semeval-2014 task 4: As-
pect based sentiment analysis. In Proceedings of the
8th International Workshop on Semantic Evaluation
(SemEval 2014).

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy