Sarcasm Detection - Miniybel - Tsegaw - July - 2019 - Final - Thesis
Sarcasm Detection - Miniybel - Tsegaw - July - 2019 - Final - Thesis
Sarcasm Detection - Miniybel - Tsegaw - July - 2019 - Final - Thesis
2020-03-16
Tsegaw, Miniybel
http://hdl.handle.net/123456789/10350
Downloaded from DSpace Repository, DSpace Institution's institutional repository
BAHIR DAR UNIVERSITY
BAHIR DAR INSTITUTE OF TECHNOLOGY
SCHOOL OF RESEARCH AND POSTGRADUATE STUDIES
FACULTY OF COMPUTING
MSc. Thesis
JULY ,2019
BAHIR DAR UNIVERSITY
BAHIR DAR INSTITUTE OF TECHNOLOGY
SCHOOL OF RESEARCH AND POSTGRADUATE STUDIES
FACULTY OF COMPUTING
By
I, the undersigned, declare that the thesis comprises my own work. In compliance with
internationally accepted practices, I have acknowledged and refereed all materials used in this
work. I understand that non-adherence to the principles of academic honesty and integrity,
misrepresentation/ fabrication of any idea/data/fact/source will constitute sufficient ground for
disciplinary action by the University and can also evoke penal action from the sources which
have not been properly cited or acknowledged.
The thesis has been submitted for examination with my approval as university advisor.
Date: 7.22.2019
Firstly, I would like to praise and thank the supernatural power and creature of the entire
universe almighty God for helping me to realize this work. Then, I was very lucky enough to
have had the support of many people and without them; the completion of this thesis work would
have been very difficult.
My special thanks go to my research advisor Dr. Gebeyehu Belay. I thank you my mentor for
your encouragement, guidance, understanding and motivation throughout this thesis work. You
have showed me how to tackle problems by investing your invaluable time. I will not forget ever,
the way you approach to me and warn me to finish the thesis work on time. Really, you are a
model advisor in advising the organization of thesis work and the flow of idea in order to better
convincing the readers.
Finally, I must express my very profound gratitude to my parents and to my classmates for
providing me with unfailing support and continuous encouragement throughout my years of
study and through the process of researching and writing the proposed research. This
accomplishment would not have been possible without them. Thank you
NN Neural Network
DT Decision Tree
RF Random Forest
IR Information Retrieval
TP True Positive
FP False Positive
FN False Negative
Sentiment Analysis is a technique to identify people‟s opinion, attitude, sentiment, and emotion
towards any specific target such as individuals, events, topics, product, organizations; services
sarcasm, entailment, etc. Sarcasm is a special kind of sentiment that comprise of words, which
mean the opposite of what you really want to say (especially to insult or wit someone, to show
irritation, or to be funny). People often express it verbally through the use of heavy tonal stress
and certain gestural clues like rolling of the eyes. Which is obviously not available for expressing
sarcasm in text? This is a crucial step to sentiment analysis, considering the prevalence and
challenges of sarcasm in sentiment-bearing text. Sarcasm detection is the task of predicting
sarcasm in text. Therefore, in this thesis we developed a model to detect the presence of sarcasm
in Amharic texts. We used primary data‟s from “Abebe Tolla‟s” “Mitsetoch”, “Silaqoch” and
“Shimutoch” essay books and his official facebook blogs. The rest of the data is collected by
using FacePager API from other Facebook blogs, and pages, which write about sarcasm
elements, and to support the data any related reference such as magazines, newspapers and
Amharic literature are used as a dataset. We used lexical (unigram), Semantic and Emoticons
(smiley faces etc) features to extract different feature sets as useable inputs for Machine learning.
Support Vector Machine (SVM), Neural Network (NN) and Random forest classifiers trained on
simple lexical dictionary based approach is used to classify the sarcastic Amharic texts based on
the features provided. An accuracy of 80.6%, 80.1 and 79% was obtained on the total collected
datasets with the Support Vector Machine, Neural Network, and Random Forest classifier
respectively. We found some strong features that characterize sarcastic texts. However, a
combination of more subtle dictionary-based features proved more promising in identifying the
various facets of sarcasm.
1.1 INTRODUCTION
The Free Dictionary defines sarcasm as a form of verbal irony that is intended to express
contempt or ridicule. The figurative nature of sarcasm makes it an often-quoted challenge for
sentiment analysis. (Bing Liu, 2010)
Therefore Sarcasm, in speech is multi-modal, involving tone, body-language, and gestures along
with linguistic artifacts used in speech. Sarcasm in the text, on the other hand, is more restrictive
when it comes to such non-linguistic modalities. This makes recognizing textual sarcasm more
challenging for both humans and machines. Sarcasm detection in the text is a difficult problem
and has only recently begun to be successfully examined as an automated natural language
processing problem. (Abhijit Mishra, 2017)
Sarcasm detection plays an indispensable role in applications likes online review summarizers,
dialog systems, recommendation systems, and sentiment analyzers. Which makes the process is
challenging but also interesting to solve such a problem with traditional Natural Language
Processing (NLP) tools and techniques (JOSHI, 2017).
As (Ellen Rilo, 2013) Developed a method for detecting sarcasm that examined the contrast
between positive sentiments that is paired with a traditionally negative situation the work used
tweets tagged with "sarcasm" as a gold standard. The juxtaposition of positive sentiment words
with negative situations incorporated the essential idea of world context into detecting sarcasm.
The goal is to learn phrases that are implicitly linked with negative sentiment. The algorithm
learns by detecting a positive sentiment word (e.g. "love") as a seed word and find a negative
situation that follows the word in the sarcastic tweet. Positive sentiment phrases are then learned
by looking at adjacent negative situation phrases. These phrases and situations are then used to
detect sarcasm in new tweets. Support Vector Machine (SVM) is used unigrams and bigrams of
the learned phrases resulting in an F-score of 51%.
Sarcasm detection in texts can be modeled as a binary document classification task. Two main
sources of features have been used. First, most previous work extracts rich discrete features
according to the texts content itself (Davidov et al., 2010; Tsur et al., 2010; Gonz´alez-Ib´anez et
al., 2011; Reyes et al., 2012; Reyes et al., 2013; Riloff et al., 2013; Pt´aˇcek et al., 2014),
including lexical unigrams, bigrams, word sentiment, punctuation marks, emoticons, quotes,
character ngrams and pronunciations. Some of these work uses more sophisticated features,
including POS tags, dependency-based tree structures, Brown clusters, and sentiment indicators,
which depend on external resources. Overall, ngrams have been among the most useful features.
The challenges of sarcasm and the benefit of sarcasm detection to sentiment analysis for
Amharic text have led to an interest in sarcasm detection as a research problem. Sarcasm
detection refers to computational approaches that predict, if a given text is sarcastic. Thus, the
sentence „I love being ignored‟ { መረሳት ያስዯስተኛሌ} should be predicted as sarcastic, while the
sentence „I love it when my son gives me a present‟{“ ሌጄ ስጦታ ሲሰጠኝ ዯስ ይሇኛሌ”} should be
predicted as non-sarcastic. This problem is difficult because of nuanced ways in which sarcasm
may be expressed. Sarcasm detection from text has now extended to different data forms and
techniques. This synergy has resulted in interesting innovations for sarcasm detection in Amharic
text.
Sarcasm is a sophisticated form of speech act and its recognition is one of the difficult tasks in
Natural language processing (NLP). Sarcasm detection can benefit many NLP applications like
review summarization, dialogue system, review ranking system. It is obvious that there is no
simple rule or algorithm that can capture Sarcasm. This paper investigates the possibility of
classifying sarcasm in text reliability and identifies typical textual features from social Media
that are important for sarcasm in the process.
Sarcasm is a form of speech act in which the speakers convey their message in an implicit way.
The inherently ambiguous nature of sarcasm sometimes makes it hard even for humans to decide
whether an utterance is sarcastic or not. Unlike a simple negation, a sarcastic sentence conveys a
negative opinion using only positive or intensified positive words. The detection of sarcasm is
therefore important, for the development and refinement of Sentiment Analysis.
Sarcasm detection in writing is challenging in part due to the lack of intonation and facial
expressions. The human comprehension system can often spot a sarcastic sentiment, and reason
about what makes it so. Recent advances in natural language sentence generation research have
seen increasing interests in measuring negativity and positivity from the sentiment of words or
phrases. However, accuracy and robustness of results are often affected by untruthful sentiments
that are of sarcasm nature and this is often left untreated. Sarcasm detection is a very important
process to filter out noisy data (in these case, sarcastic sentences) from training data inputs,
which can be used for natural language sentence generation. [ (Chun-Che Peng, 2015)]
So there is a need for good sampling and classification techniques for these reviews and
opinions. For this reason, many researches on sarcasm detection have been done and are being
undertaken for English and other languages (Ellen Rilo, 2013).
In addition to that, sarcasm detection for Amharic texts has never been studied even though the
amount of sarcasm texts on the web is increasing (Abreham, 2014). Therefore, this study
investigates and aims to develop a sarcasm detection model for Amharic texts.
1. Which features from the theories contribute most to sarcasm detection for Amharic text?
2. Does our modeling approach work for Amharic texts well?
3. What are the challenges of sarcasm detection for Amharic texts model developments?
The general and specific objectives of this study are given below:
General Objective: the general objective of the research is to design and develop a sarcasm
detection model for Amharic texts.
Specific Objectives: the specific objectives of the proposed research work are:
Sarcasm detection is a complex and recent research discipline that requires the effective analysis
and processing of documents. The system is designed to analyze Amharic sarcasm corpus
collected from Abebe Tolla review and identify the text into sarcasm and Nonsarcasm.
The scope of this research is:
Focused to sarcasm detection (only sarcasm, or not sarcasm) classification.
Domain-independence is one of the biggest problems in machine learning and
classification (Abreham, 2014).The system works for domain specific only for Abebe
Tolla books review domain.
The sentiment analysis holder identification and reasons for positive and negative
classifications are not covered in this research work.
Because of their complicated nature, Amharic proverbs “ቅኔያዊ አነጋገር” are out of the scope
of this research work.
An important part when working with text classification is getting hold of good quality data,
which is difficult in the case of Sarcasm text. It is often necessary to annotate data manually.
Most of the datasets used for conducting the experiment are manually collected from „Abebe
Tolla‟s‟ “Shimutoch” and “Mitsetoch 1,2” books and from his Facebook blogs. The rest of the
dataset is collected from any related reference such as magazines; newspapers and Amharic
literature are used.
We have used supervised machine learning approaches, which needs to have primary sentences
labeled as sarcastic or as non-sarcastic so that our classifier can build the model.
In this research, we used a supervised approach, which is one of the well-known techniques for
sentiment classification (Belete, 2013).Five main processes (phases) are employed in applying
sarcasm detection for Amharic texts.
1. Preprocessing: - the text of a document has to be converted into data that a ML algorithm
can analyze. Since Amharic writing system has homophone characters which mean
characters with the same sound have different symbols, such type of inconsistency in
writing words were handled by replacing characters of the same sound by a common
symbol using normalization. The text is broken down into discrete units using
tokenization and then several operations are applied for removal of stop words, removal
of punctuation, and stemming for effectiveness and efficiency improvements. This
process has been done using python 3.6. We initiated to use Python because; Python's
syntax is clear and readable. The way Python's syntax is organized imposes some order to
programmers. Experts and beginners can easily understand the code. Because the block
structures in Python are defined by indentations, it has much less likely to have bugs in
2. The second step creating lexical words by using unigram words. This lexicon
construction is concerned with building a dictionary of sarcasm sentence and assigning
prior polarity value. Since we applied supervised techniques, the preparation of training
data set annotating by manually.
3. Weight assignment and propagation in this step, every polarity word and modifier get the
initial weight defined in the sentiment lexicon. If the word is linked to a modifier, the
polarity value is multiplied by a coefficient (Xiaoying, 2009.) Or some value is added to
the initial value (Inkpen, 2006.)
4. Feature selection (FS) is the important step in text classification; feature selection is to
construct vector space, which improves the scalability, efficiency and accuracy of a text
classifier. The main idea of feature selection is to select subset of features from the
original data. Feature selection is performed by keeping the words with highest score
according to predetermined measure of the importance of the word. In next step, before
using ML methods in text categorization, it is essential to choose which features are the
most suited for this task. In text categorization, FS algorithms are a strategy to make text
classifiers more efficient and accurate. It will attempt to reduce the dimension considered
in a task so as to improve performance on some dependent measures. If the
dimensionality of a domain expands, the number of features will increase (Varela, 2012).
The features considered for detecting sarcasm comes from the Amharic text that was
conducted in the beginning of this study and are based on different aspects that we
5. The final step is applying machine learning algorithms to classify Amharic sarcasm texts
into pre specified categories (sarcasm and Nonsarcasm) by using Natural language
processing toolkit (NLTK).
Three machine learning algorithms were employed for classify the collected data as
sarcasm and Nonsarcasm such as Support Vector Machine, Neural Network and Random
Forest. Testing with more than one classification algorithms were provided comparison
clues for determining best performed algorithms for Amharic sarcasm text in the domain
because of many research works in sentiment analysis achieved high performance using
them.
Support Vector Machine (SVM):- SVM are a set of supervised learning methods for
machine learning that can be used for classification, anomaly detection and regression.
One of SVMs strong suits is that they are very efficient when your data is high
dimensional and also effective in cases where number of dimensions are greater than the
number of samples. SVMs are especially good to solve text and hypertext categorization
since the application decreases the use of label training occasions (Witten, 2005).
Disadvantages with SVM are that they do not provide an estimate of the probability. To
get these a calculation of an expensive cross-validation may be performed. In cases when
the number of samples is significantly less than the number of features is likely that the
method gives a poor result.
Random Forest (RF) A random forest is a Meta estimator that fits a number of decision
tree classifier on various sub-samples of the dataset and uses averaging to improve the
predictive accuracy and control over-fitting. The sub-sample size is always the same as
the original input sample size but the samples are drawn with replacement.
This research needs a programming language that is convenient to use for particular machine
learning methods. Python is a versatile programming language that supports both object oriented
programming and functional programming. It provides wide spectra of libraries that supports a
lot of different programming tasks and is also open source. The libraries are easy to install and
well documented. One more advantage with Python is that it is easy to read because the use of
indents and new lines instead of more brackets. The libraries described below are some of the
most common that have been used in the research. More information about Python and the
libraries can be found at Pythons own webpage.
Numpy gives support for high-level mathematical functions, large arrays and matrices.
When using linear algebra and random number capabilities are Numpy very useful.
Numpy also has utilities for integrating FORTRAN and C/C++ code.
Scikit-learn are a Machine Learning library that Google started working on in 2007. The
first public release was in 2010 and has been updated continuously. It is built on the math
packages numpy and scipy. Among many other features it includes a variety of
regression, classification and clustering algorithms such as naive Bayes, random forests,
support vector machines, boosting algorithms etc. (Pedregosa, 2011)
1.7.3 Evaluation
To assess the effectiveness of the proposed model testing corpus of sarcasms has been prepared
and classified in correct classes specifically sarcasm and Nonsarcasm. Evaluation has two
primary functions. Primarily, it helps to predict how well the final model will work. Secondly,
evaluation is an integral part of many learning methods and helps to explore the model that best
represents the training data.
The different classification models developed in this research were evaluated using classifier
accuracy in test dataset. Similarly for the evaluation of opinion classification using regular
expression is usually measured by precision, recall and F-measure. Precision is the fraction of
relevant retrieved instances, while recall is the fraction of retrieved relevant instances. F-measure
also used to fairly treat recall and precision which is brings recall and precision into a single
measure. Therefore precision, recall and F-measure had been used based on an understanding
and measure of relevance.
Some interesting estimations in classification problems are accuracy, precision and recall. In
statistical terms recall the ratio between the number of true positives and the sum of true
positives and negative positives. Precision is the ratio between the true positive and the sum of
true positives and false positives. Finally the F1-score is an accuracy measure that can be
interpreted as weighted mean of the precision and recall:
Detecting sarcasm is a very important field in Natural Language Processing. Sentiment analysis
and text summarization taking sarcastic statements literally might result in a completely incorrect
analysis of the data. Sarcasm detection can therefore, help to improve NLP systems like product
review summarization (text summarization), brand monitoring, dialogue systems, and sentiment
analysis. Sarcasm detection can also help in conflict resolution - Many conflicts arise between
writers and politicians/celebrities due to a misunderstanding of sarcasm in written text, which
can be avoided through effective Sarcasm Detection.
In the current business and political situations, knowing what other people think is a determinant
factor in decision making. The results of this research can be used as an input to the development
of full-fledged sentiment analysis system for Amharic language or any other Ethiopian languages
that make use of the same Ethiopic alphabets such as Tigrinya, Guragignya, and others. Hence,
the Amharic sarcasm detection model can be used for different purposes. Some of them are:
Business and organizations (product review mining and service analysis, market
intelligence) can use the system to reduce the money spent to find consumer‟s sentiment
and opinions.
Individuals (who are interested in other‟s opinion, can use it when purchasing a product,
using a service or finding opinions on political topics).
Government intelligence can use the system for sarcasm detections (analysis sentiments)
of people on a particular issue.
This thesis is organized into five chapters consisting of Introduction, Literature review, Sarcasm
detection techniques and algorithms, Experimentation and Evaluation metrics and Conclusion
and Recommendations.
The first chapter gives the general introduction of the thesis that contains an overview of the
study, statement of the problem, objectives, methodology, Scope, and limitation of the study and
Significance of the study. The second chapter presents reviews made on different literatures
regarding sarcasm detection model together with its approaches and different machine learning
techniques as well as previous related works for both English and Non-English documents in
varies domains. Chapter Three illustrates sarcasm detection techniques and algorithms which
contains corpus preparation and preprocessing, system architecture, feature selection methods,
classification techniques, and performance measurement. The fourth chapter discusses the
experimentation and discussion of the findings of how these experiments and methodologies
were implemented. Finally, chapter five deals with the conclusion and the recommendation
drawn from the findings of the study.
2.1. INTRODUCTION
According to (Camp., 2012) showed that there are four types of sarcasm:
There are different approaches to the problem of sarcasm detection. The most commonly applied
techniques for sarcasm detection are described as follows.
Machine learning treats sentiment classification simply as a special case of topic based
categorization (with the two topics or classes being positive and negative sentiment). The
traditional topic based categorization attempts to sort documents according to their subject matter
(e.g. sports vs. politics).A system capable of acquiring and integrating the knowledge
automatically is referred to as machine learning. The systems that learn from analytical
observation, training, experience, and other means, results in a system that can exhibit self-
improvement, effectiveness and efficiency.
Supervised learning, the classes are predetermined. The classes are previously known. In a
supervised learning, let the domain of instances be X and the domain of labels be Y. Let P(x, y)
be an unknown joint probability distribution on instances and labels X × Y. Given a training
sample {(xi, yi)}, supervised learning trains a function f: X → Y in some function family F, with
the goal that f (x) predicts the true label y on future data x (Goldberg, 2009)
According to (O. Chapelle, 2006.) The goal of supervised learning is to learn a mapping from x
to y, given a training set made of pairs (xi, yi). Here, the yi ∈ Y are called the labels or targets of
the examples xi. The process in supervised learning is that a certain part of data will be labeled
with known classifications. The machine learner's task is to search for patterns and construct
mathematical models. These models then are evaluated on the basis of their predictive capacity
in relation to measures of variance in the data itself.
Supervised learning is the current dominant technique for addressing sentiment analysis in text
classification techniques includes Hidden Markov Models (HMM), Decision Trees, Maximum
Entropy Models (ME), Support Vector Machines (SVM) and Random Forest (RF). These are all
variants of the supervised learning approach, which typically feature a system that reads a large
annotated corpus, memorizes lists of entities, and creates disambiguation rules based on
discriminative features (D. Nadeau, 2007)
n
Unsupervised learning algorithms work on a training sample with n instances {xi} i =1. There is
According to (Alpaydin, 2004) the goal of unsupervised learning is to group data into clusters. In
fact, the basic task of unsupervised learning is to develop classification labels automatically.
Unsupervised algorithms seek out similarity between pieces of data in order to determine
whether they can be characterized as forming a group. These groups are termed clusters, and
there is a whole family of clustering machine learning techniques.
Sarcasm is a form of figurative language where the literal meaning of words does not hold, and
instead the opposite interpretation is intended (H Paul Grice, 1975). Sarcasm is closely related to
irony - in fact, it is a form of irony. (Gibbs., 1994) State that „verbal irony is recognized by
literary scholars as a technique of using incongruity to suggest a distinction between reality and
expectation‟. They define two types of irony: verbal and situational. Verbal irony is irony that is
expressed in words. For example, the sentence „Your paper on grammar correction contains
several grammatical errors.‟ is ironic. On the other hand, situational irony is irony that arises out
of a situation. For example, a situation where a scientist discovers the cure for a disease but
herself succumbs to the disease before being able to apply the cure is a situational irony.
(Raymond W Gibbs., 1994) Refereed to sarcastic language as „irony that is especially bitter and
caustic‟ there are two components of this definition: (a) presence of irony, (b) being bitter. Both
together are identifying features of sarcasm. For example, „I could not make it big in Hollywood
because my writing was not bad enough‟. This example from (Aditya Joshi V. T., 2016) is
sarcastic, because: (a) it contains an ironic statement that implies a writer in Hollywood would
need to be bad at writing, (b) the appraisal in the statement is in fact bitter/contemptuous towards
the entity „Hollywood‟.
As it seen on the above extract Abebe told to his reader about the dictators who had striven to
build their image in front of the public and also how they killed many people in the hidden place.
On the other hand Abe expressed his advices to the government by saying this
“ህዝቡ ይወዯኛሌ” ብል መዘናጋት ሇጋዲፊም አሌበጀም፡፡ ሇማንኛውም ተዘጋጅቶ መጠበቅ ነው፡፡ (ምፀቶች ገፅ 110)
“Believing the people by saying they love me even not used for Gaddfi. Being readiness is advisable.
(MITSETOCH p. 110)”
As it knows the Arab revolution took away Gaddfi from his power. But he was preaching the
passion of the people towards him: they removed and killed him. The sarcastic aware the
government not cheated by the people love.
In general, sarcasm is a verbal irony that has an intention to be mocking / ridiculing towards an
entity. However, what context is required for the sarcasm to be understood forms a crucial
component. Compare the sarcastic example „I love being ignored‟ with another „I love solving
math problems all day‟. The former is likely to be sarcastic for all speakers. The latter is likely to
be sarcastic for most speakers. However, for authors who do really enjoy math, the statement is
not sarcastic. The sarcasm in the latter may be conveyed through an author‟s context or
paralinguistic cues (as in the case of illocutionary sarcasm). Thus, sarcasm understanding and
automatic sarcasm detection are contingent on what information (or context) is known.
Sarcasm is related to other forms of incongruity or figurative language. Sarcasm has an element
of ridicule that irony does not (Katz., 1998).
Deception also appears to be closely related to sarcasm. If a person says „I love this soup‟, they
could be speaking the truth (literal proposition), they could be lying (deception) or they could be
sarcastic (sarcasm). The difference between a literal proposition and deception lays in the
intention of the speaker while the difference between sarcasm and deception lies in shared
knowledge between speaker and listener. If the speaker saw a fly floating on the soup, the
statement above is likely to have a sarcastic intention. Whether or not the listener understands the
sarcasm depends on whether the listener saw the fly in the soup and whether the listener believes
that the presence of a fly in a soup makes it bad (Gibbs., 1994)
Sarcasm as a form of aggressive humor Thus, the peculiarity that distinguishes sarcasm from
another form of incongruent expression, humor, is the element of mockery or ridicule. (Stefan
Stieger, 2011)
(Gibbs., 1994) Distinguished between metaphor and sarcasm in terms of the plausibility of the
statement they state that a metaphor is never literally plausible. For example, A says to B, „You
are an elephant‟ to imply that B has a good memory is metaphorical because a human being
cannot literally be an elephant. However, sarcasm, as in the case of „You have a very good
memory‟ may be plausible for people with a good memory, but sarcastic if said to a forgetful
person. These characteristics of sarcasm relate it to these linguistic expressions like humor or
metaphor. It is also these characteristics such as incongruity, shared knowledge, plausibility, and
ridicule that form the basis of my work in sarcasm detection for Amharic texts.
In the lexical based technique, the definition of sentiment is based on the analysis of individual
words and/or phrases; emotional dictionaries are often used: emotional lexical items from the
dictionary are searched in the text, their sentiment weights are calculated, and some aggregated
weight function is applied (Etstratios K., 2013).When using the lexical approach there is no need
for labeled data and the procedure of learning, and the decisions taken by the classifier can be
easily explained. However, this usually requires powerful linguistic resources (e.g., emotional
dictionary), which is not always available, in addition, it is difficult to take the context into
account. Dictionaries for lexicon-based approaches can be created manually or automatically,
using seed words to expand the list of words. Much of the lexicon-based research has focused on
using adjectives as indicators of the semantic orientation of text (Etstratios K., 2013).
In this section, we described the approaches used for sarcasm detection. In general, approaches
to sarcasm detection can be classified into rule-based, statistical and deep learning-based
approaches
Rule-based approaches attempt to identify sarcasm through specific evidence. This evidence is
captured in the form of rules that rely on indicators of sarcasm. (Hao, 2010) Identified sarcasm
in similes using Google searches in order to determine how likely a simile is. They present a 9-
step approach where at each step/ rule, a simile is validated as non-sarcastic using the number of
search results. To demonstrate the strength of their rules, they present an error analysis
corresponding to each rule.
1. A simile is classified as non-ironic if there is lexical/ morphological similarity
between: i) the vehicle and the ground (e.g., as manly as a man); ii) between the
The research work (Greenwood., 2014.) Proposed that hashtag sentiment is a key indicator of
sarcasm Hash tags are often used by tweet authors to highlight sarcasm, and hence, if the
sentiment expressed by a hashtag does not agree with rest of the tweet, the tweet is predicted as
sarcastic. They use a hashtag tokenizer to split hash tags made of concatenated words.
As (Santosh Kumar Bharti, 2015) Presented two rule-based classifiers the first uses a parse–
based lexicon generation algorithm that creates parse trees of sentences and identifies situation
phrases that bear sentiment. If a negative phrase occurs in a positive sentence, then the sentence
is predicted as sarcastic. The second algorithm aims to capture hyperbolic sarcasm (i.e., by using
interjections (such as „(wow)‟and intensifiers (such as „absolutely‟) that occur together.
According to (Ellen Rilo, 2013) Presented rule-based classifiers that look for a positive verb and
a negative situation phrase in a sentence the set of negative situation phrases are extracted using
In this section, we review the set of features that have been reported for statistical sarcasm
detection. Most approaches use bag-of-words as features. However, in addition to these, several
other sets of features have been reported. Oren Tsur, 2010 designed pattern-based features that
indicate the presence of discriminative patterns (such as „as fast as a snail‟) as extracted from a
large sarcasm-labeled corpus. To prevent overftting of patterns, these pattern-based features take
real values based on three situations: exact match, partial overlap, and no match.
As (González-Ibánez, 2011) Used sentiment lexicon-based features and pragmatic features like
emoticons and user mentions similarly, (Delia Iraz´u Herna´ndez Far´ıas, 2016) Used features
derived from multiple affective lexicons such as AFINN, SentiWordNet, General Inquirer, etc. In
addition, they also use features based on semantic similarity, emoticons, counter factuality, etc.
3.1 INTRODUCTION
In this chapter, the design and implementation of the proposed sarcasm detection model for
opinionated Amharic texts are described in detail. The proposed model has the following
components: pre-processing, sentiment word detection for Amharic sarcasm texts, weight
manipulation, polarity classification and polarity strength (post-polarity classification analysis).
Each component is composed of sub components which are the building blocks of the system.
Pre-processing is responsible for normalization of texts and words segmentation. In the
sentiment words detection for Amharic sarcasm texts component, all possible sentiment words
and contextual valence shifter terms are checked for existence in the Amharic word lexicon. The
weight manipulation component contains sub systems: weight assignment and polarity
propagation. After the weight manipulation is completed, the next step is the polarity
classification of the texts.
The strength of the polarity (whether how much it is positive or negative) is rated in the post-
classification analysis step. The sentiment word detection for Amharic sarcasm texts and weight
manipulation activities are fully dependent on the lexicon of Amharic lexicon terms that contains
opinion terms, punctuation marks and emojis tagged with readily interpretable values. The
procedures of building the sentiment lexica, the types of lexicon, the guidelines and principles
followed during the sentiment lexicon building process are also described in this chapter. In
addition, tools used for implementing the system and the proposed algorithms are also presented.
The general architecture of sarcasm detection model development for Amharic texts is given in
Figure 3-1. The architecture has six major components; these are collected document, Document
Preprocessing, Amharic word lexicon corpus preparing, sentiment word detection with weight
assignment and Feature selection and Classification task.
Supervised machine learning for sarcasm classification tasks requires an annotated corpus to
train and test a classifier. For the Amharic language there is no standardized corpus for sentiment
analysis. So the construction of labeled corpus is a very important step because it would allow
for more experiments, especially with supervised classification.
Totally, we gathered 800 comments and book reviews on facebook pages and Abebe Tolla‟s
books. The data set consists of 800 Amharic sarcasm sentences and facebook comment text
reviews on Sarcasm, from which 400 are labeled Sarcasm, 400 are labeled Non-Sarcasm
Amharic texts so that there will be balanced class distribution. For the purpose of the experiment
from the total Sarcasm, texts review 20% (20/80) is randomly selected for testing. Most of the
data is collected from Abebe Tolla‟s books and the rest of the dataset is collected from Facebook
pages, magazines, newspapers and Amharic literatures‟.
The second phase of the sarcasm detection model for Amharic text is the preprocessing
component. The pre-processing activity is important to improve the accuracy, efficiency, and
scalability of the classification process. Preprocessing activity involves normalization and
tokenization. The input for this process is a text data, but not every word in the text is meaningful
for categorization. For this reason, the data must be processed and represented to a concise and
an identifiable format or structure. Non-standard words such as numbers, abbreviations and dates
are removed from the dataset. In this research, the different characteristics or features of the
3.2.2.1 Tokenization
Tokenization refers to the process of splitting the text into a set of tokens (usually words). This
process detects the boundaries of a written text. The Amharic language uses a number of
punctuation marks which demarcate words in a stream of characters which include ‘huletneTb’
(:), „aratneTb’ (::), „deribsereze’ (፤), „netelaserez‟ (፣), exclamation mark „!‟ and question mark
‟?‟ Punctuation marks one of the most relevance in sarcasm detection task and has to be used
mostly in Amharic texts. Amharic text tokenization process can be done by using the algorithm
shown in algorithm 3-1.
1. While the Amharic texts input exists
Read Character
If the character is white space or Amharic punctuation marks
Append token
Else
Assign token with token + Character
2. While end
3. Pass token to next processing
4. Close files
3.2.2.2 Normalization
Amharic writing system has homophone characters which mean characters with the same sound
have different symbols for example; it is common that the character ስ and ሥ are used
interchangeably as ስራ and ሥራ to mean “work”. These different symbols must be considered as
similar because they do not have an effect on meaning .Such type of inconsistency in writing
words will be handled by replacing characters of the same sound by a common symbol. Thus, for
This activity is responsible for detecting sarcasm sentence by used polarity terms and contextual
valence shifter terms. After the data is preprocessed, every valid term in the text is checked
whether it is sentiment word or not. This is done by a simple detection mechanism where the
whole lexicon is scanned for every term. If the term exists in the dictionary, then the term is a
polarity word (positive or negative) or a contextual valence shifter (negation or intensifier).
Polarity words are terms that can express opinions towards an object such as „ጥሩ‟ (good) that
expresses a positive opinion and „መጥፎ‟ (bad) that expresses a negative opinion towards an
object. These terms are properly tagged in the lexicon with computer interpretable values as „2‟
The general purpose of the Amharic words lexicon is used for opinion mining system in any
domain. This is because the opinion terms in this lexicon are not restricted to the specific domain
rather it contains any opinion terms in the Amharic language. As a result, the valid terms in the
collected data are first preprocessed and saved into the dictionary. Then if at least a single term is
found in the Amharic words lexicon, the process continues to the next step (weight assignment
and polarity propagation) otherwise the general lexicon is scanned for further search. If the term
taken from the collected data is not found in Amharic words lexicon, this term is considered as
non-sentiment word and it is discarded as such terms are not important in the detection of
sarcasm word.
In this phase, the main activities are weight assignment and polarity propagation. By used the
Senti-strength tool, the polarity of each word is generated. The value generated lies in the range
[-5, 5]. If the value is positive, it is taken as a word with positive polarity. Similarly, if it is
negative it is taken to be a word with negative polarity. Using these values two features are
generated namely the total count of the words with positive and negative polarity. All possible
positive sentiment terms are tagged in the Amharic word lexical by giving „+‟ and given a
default value of +2 at run time. All the negative sentiment terms are tagged by „-‟ and given a
default value of -2. Before the final average polarity weight is calculated, the polarity
propagation is done which is used to modify the initial value of the sentiment terms and followed
the Amharic grammatical rules like: all of these rules are finding from (Gebremeskel, 2010) and
we used as for this thesis to complete the overall polarity feature set.
Rule 3: if a positive sentiment term is followed by an understatement term, the initial value of
that term is decreased from +2 to +1. For example in the sentence „ጥሩ ቢሆንም‟ the polarity
weight of the sentiment term „ጥሩ‟ (good) is decreased from the initial value +2 to +1 due to the
understatement term ቢሆንም‟.
Rule 4: if a negative sentiment term is preceded by an overstatement term, then the initial value
of the term is decreased by -1 from -2 to -3. For example in the sentence „በጣም መጥፎ ነዉ‟ due to
the overstatement „በጣም‟ (very), the initial weight of the sentiment word „መጥፎ‟( bad) is
decreased from -2 to -3.
Rule 5: if a negative sentiment term is followed by an understatement term, the initial weight of
that term is increased by +1. For example, in the sentence „መጥፎ ቢሆንም the initial weight of the
sentiment term is increased from -2 to -1 due to the understatement term.
Rule 6: if a sentiment term is not linked to any contextual valence shifting term, the initially
assigned weight is considered for further process.
In any Machine Learning task features are of central importance. The quality of the classification
depends on the features selected. Carefully designed and chosen features play a big role in
improving the results both qualitatively and quantitatively.
Supervised machine learning techniques involve the use of a labeled training corpus to learn a
certain classification function and involve learning a function from examples of its inputs and
outputs (schrauwen, 2010.). The output of this function is either a continuous value („regression‟)
or can predict a category or label of the input object („classification‟). A classifier is called
supervised if it is built based on training corpora containing the correct label for each input.
The process is based on learning a model given a set of correctly classified data. The aim of
supervised learning is to train a model to recognize discriminant attributes (in statistical literature
supervised learning is sometimes known as discriminant learning) in the data (Michie, 1994). Let
us say that we want to build a model that can separate news articles about soccer from religion.
With a given set of labeled data, the model can be trained to e.g. learn that some words
(attributes) are used solely, or more frequently, in one of the classes. The model might have
learned that articles containing words such as “referee”, “player” or “goaltender” is more likely
to be an article about the sport. Conversely, words such as “god”, “church” and “Islam” are more
likely to occur in an article about religion. New unseen news articles can then be predicted to be
an article about soccer or religion depending on the frequency of its words.
Supervised methods cannot always be used, because labeled corpora are not always available.
Unsupervised and weakly-supervised methods are another option for machine learning that does
not require pre-tagged data. Unsupervised involve learning patterns in the input when no specific
output values are supplied (Norvig, 2003.). This means that the learner only receives an
unlabelled set of examples. Unsupervised methods can also be used to label a corpus that can
later be used for supervised learning. Examples of unsupervised learning methods are (k-means)
clustering or cluster analysis and the expectation-maximization algorithm, an algorithm for
finding the maximum likelihood.
Semi-supervised learning is based on the fact that labelled data can be hard to obtain and
unlabelled data is cheap. The idea is to combine a small set of labelled data and expand it using
unlabelled data with the help of unsupervised learning. The result is then a big set of labelled
data, perhaps containing some noise that can be used for supervised classification. (Lin C. et. al.,
2011)
In the remainder of this section, we present three classifiers that have been used during this
research work, every single one is used for supervised learning.
In this sub section, the sarcasm detection for Amharic texts lexicon building issues, the tools
used for implementing the system, the procedures to integrate the different components, the
proposed algorithm, the input review, output result and other related issues are described.
Several approaches use lexical sentiment as a feature to the sarcasm classifier. It must, however,
be noted that these approaches require „surface polarity‟: the apparent polarity of a sentence.
(Santosh Kumar Bharti K. S., 2015) Described a rule-based approach that predicts a sentence as
sarcastic if a negative phrase occurs in a positive sentence
Sarcasm consists of: (i) the use of irony, and (ii) the presence of ridicule. Based on the theories
described here, understanding sarcasm (by humans or through programs) can be divided into the
following components:
A. Identification of shared knowledge: the sentence „I love being ignored‟ cannot be
understood as sarcastic without the shared knowledge that people do not like being
ignored. This specifically holds true in case of specific context. For example, the
sentence „I love solving math problems all weekend‟ may be perceived as non-
sarcastic by a listener who loves math or by a listener who knows that the speaker
loves math. A listener, in these situations, would either look for a dropped negation or
an echoic reminder, as given by theories above,
B. Identification of what constitutes ridicule: the ridicule may be conveyed through
different reactions such as laughter, change of topic, etc. (JOSHI, 2017)
In order to achieve our objective, we used different environments and tools. Python
programming language is used to develop the model. It is an interpreter, object oriented, high
level programming language with dynamic semantics. Its high-level built in data structures,
combined with dynamic typing and dynamic binding; make it very attractive for Rapid
Application Development, as well as for use as a scripting or glue language to connect existing
components together (python).
The python programming language is a dynamically typed, object oriented, interpreted language
and it is great for natural language processing (NLP) because it is simple, easy to debug
(exceptions and interpreted language), easy to structure (modules and object oriented) and
powerful for string manipulation.
We used python 3.6.6 version because it is possible to use encodings different than ASCII in
python source files. As a result, Amharic language characters are directly interpreted by
python3.6.6 and above versions without the need to go for transliteration or feeding the Unicode
representation of the characters. All the source codes and rules of the model are written in
python3.6.6 compatible format because this version doesn‟t support backward compatibility.
Dictionary representation
Dictionary is a useful built in data type into python. Regular python dictionaries iterate over a
key: value pairs in an arbitrary order. Dictionaries are sometimes found in other languages as
“associative memories” or “associative arrays”. Unlike sequences, which are indexed by a range
of numbers, dictionaries are indexed by keys, which can be any immutable type; strings and
numbers can always be keys. Tuples can be used as keys if they contain only strings, numbers, or
tuples. If tuple contains any mutable object directly or indirectly, it cannot be used as keys.
Dictionaries in python are an unordered set of keys: values pairs, with the requirement that the
keys are unique (with one dictionary). A pair of braces creates an empty dictionary: {}. Placing a
comma separated list of key: value pairs within the braces adds initial key: value pairs to the
The lexicon of Amharic word lexicon (dictionary) can be put within the source code or can be
imported as a text file at run time. Therefore for each key, its corresponding value is returned for
further process. As a result for each Amharic sentiment terms in the input review (key), the
whole dictionary is scanned for its corresponding value. A sample of the input text data is given
in figure 3-2.
In this paper, we presented a novel supervised learning algorithm for sarcasm identification. The
algorithm has two modules: (i) Feature-extraction, and (ii) classification.
Lexical Features
N-grams are a commonly used feature set for NLP related tasks in Machine Learning. Certain
words or phrases like "Yeah right!" may be prevalent in sarcastic tweets. Presence of such words
can be a strong indicator for sarcasm. We use unigrams in order to extract the lexical information
contained in the text.
Using the training corpus a dictionary is created. Each unique word is mapped onto a particular
ID. These ID numbers are used as the feature numbers. The value corresponding to each such
feature number is the frequency of occurrence of that particular word in the text for which we are
generating the feature values. The dictionary would be large owing to the vocabulary available in
the corpus. The text would contain only a few words from this large vocabulary. Hence, the
feature vector would naturally contain a lot of 0's corresponding to the words that are not
appearing in the sentence but are present in the dictionary. The ID's with value (frequency) 0 can
be discarded since we are looking for the presence of words prevalent in sarcastic texts which
can be a potentially important indicator while the absence of words conveys no information here.
Punctuation Features
These features determine when the sentiment of the sentence differs from the emoticons or
similes. For example: - “እዴሜ ጠገቡ ትምህርት ቤት „ሇሌማት‟ ፈረሰ! ☹” In the example, the sentiment of
the sentence is supposed to be positive but the use of the frowning face simile makes it sarcastic.
Number of emoticons
Emoticons are commonly used across social media platforms to express sentiments. As a feature,
they can be captured using UTF-8 encoding. Using the 'codecs' module in Python, files
containing emoticons in UTF-8 format can be opened and read. The emoticons present in them
can be captured using regular expressions.
In the Amharic language there is popular slang expressions such as 'ኪኪኪኪ', 'ሃሃሃ' and 'ቂቂቂ' are
fairly well used. Various variants of these have also been accounted for. The frequency of
occurrence of these expressions is a feature. Since sarcasm is intended to have an element of
humor, higher occurrence of these is potentially indicative of sarcasm.
The Exclamation mark is often used to lay extra emphasis on the underlying emotion like a
surprise, shock or dismay. Even in sarcastic texts, such use of exclamation marks is prevalent
especially that of “!”, “?” and “…”. The count of such exclamation marks is hence used as a
feature.
Semantic feature
These features determine whether there exist words which contradict or are nearly opposite of
each other. The linguistic theory of Context Incongruity suggests that the common form of
sarcasm expression consists of a positive sentiment which is contrasted with a negative situation.
For example: - “ዴንቁርና በረከት ነው...!” In example two nearly opposite words are used which
First, it takes tokenized and normalized Amharic texts terms and checks them if they bear
Amharic sentiment words in the Amharic word lexical corpus. This is done by checking the
existence of the terms in the dictionary of Amharic sentiment terms and by checking the
existence of graphical expressions in emoticon lookup tables. Next, the sentiment terms are
assigned initial polarity weight and polarity propagation is done if the sentiment terms are linked
to contextual valence shifter terms. Using the Senti-strength tool, the polarity of each word is
generated. The value generated lies in the range [-5, 5]. If the value is positive, it is taken as a
word with positive polarity. Similarly, if it‟s negative it is taken to be a word with negative
polarity. Using these values two features are generated namely the total count of the words with
positive and negative polarity.
Lexical Polarity
This is the overall polarity of the entire sentence. Owing to the theory of lexical incongruity, it
can be observed that a text which has an overall strong positive polarity is more likely to be
sarcastic compared to a text with overall negative polarity. This is because in general sarcasm
tends to be caustic.
In order to perform the classification based on the features mentioned above, we explore a set of
standard classifiers typically used in text classification research. We used a support vector
machine SVM with a linear kernel in the implementation provided by LIBSVM. We also used a
Neural Network (NN) classifier and Random Forest classifier (RFC) as comparisons to each
other. For error estimating it is common to divide the entire data set in to train set and test set.
The model that obtains the best result on the test data is then used as how it performs.
Finally, the text assigned into predefined categories Sarcasm and Non-Sarcasm based on the total
weight obtained from the previous step. The high level view of the proposed algorithms that
show how the sentiment terms, punctuation marks and laugher expression (emojis) are detected
and how the sentiment polarity value is propagated as algorithm 3-3
4. Else the text is assigned into a unclassified class because there are no sentiment terms Ts in
the given text
Algorithm 3-3 Algorithm for sarcasm detection by used text polarity propagation
Figure 3-4 Combine data from Sarcasm and Nonsarcasm files to give it as input to the NN and RF classifiers
4.1 INTRODUCTION
Three supervised techniques are used for conducting the experiments: The Support Vector
Machine (SVM), Random Forest (RF) and Neural Network (NN) classifiers. We tested each
technique individually and evaluate its performance. The procedure is, as is standard in
supervised machine learning tasks, first training a classifier on pre-classified training data and
then evaluating the performance of the classifier on un labeled set of test data. We selected to
work with the Natural Language Toolkit (NLTK). This package is equipped with several
classifiers (i.e. SVM, RF, and NN). All programming has been done in the Python programming
language and executed in the programming environment Window 10 python interactive shell.
In this research, three experiments had done by using unigram words and most informative
words with the three learning algorithms SVM, Decision tree and Neural Network classifiers. All
the results are presented in the subsequent section. All works are done using NLTK classification
packages and python programming.
To evaluate the Sarcasm detection model for Amharic texts, we used procedures and setups that
include data collection, methods and manual classifications. These are described in the
subsequent sections.
As presented in the previous chapters, for conducting all experiments, we have considered the
Abebe Tolla‟s books and some sarcasm related facebook pages as reviews domain. The main
reason why we used those as a domain, due to the lack of readily available reviews written in the
Amharic language electronically such as in web, blogs and online forums in others domain. As a
In addition, Amharic entertainment viewers can write comments freely as compared to other
domains such as politics, products, etc. Hence, most of the sarcasm detection for Amharic texts
we used for conducting the experiments is collected manually. After collecting the data from
those domains source the sarcasm texts are coding into a computer and categorize in to two
labeled classes. These are Sarcasm and Non-Sarcasm. As a result, a total of 800 data are
collected from all the sources described above. After the data is collected, preprocessing tasks
are applied to construct the final data set (data that are used as input for the modeling tool) from
the initial raw data. Data preparation tasks are usually performed multiple times depending on
the quality and size of the initial data set. A task includes cleaning; normalization and
tokenization of the data were performed to come up with the final suitable dataset for the
selected algorithms.
This activity is concerned with labeling the data for the experimental purpose. All the 800 data
are manually categorized by an independent individual from the data source. If the given data is
not related with the topic in the target, it is assigned into the Non-Sarcasm category. As a result,
400 of the total data are labeled as Sarcasm and the rest 400 of them are labeled as Non-Sarcasm.
The manually classified reviews helped us in crosschecking with the results obtained from the
proposed Sarcasm detection model for Amharic texts.
Table 4-1 Number of Data classification Manually from Different data sources
This activity is responsible for describing the evaluation parameters of the designed model and
its results. Evaluation of the system is made with the evaluation parameter that compares the
number of the data which, are categorized correctly and incorrectly. The comparison is done
between the data categorized by the proposed model system and that of the manually labeled
(categorized) data. Then after the precision, recall and F-measure compute as follows.
Precision (P) is the number of true positives divided by the total number of elements labeled as
belonging to that class. A high precision means that the majority of items labeled as for instance
„positive‟ indeed belong to the class „positive‟ and it defined as.
True positives are positive items that we correctly identified as positive for positive class
and Negative items that we correctly identified as Negative for negative class.
False positives (or Type I errors) are negative comments that we incorrectly identified
as positive for positive class and positive comments that we incorrectly classified as
negatives for negative class
.
Recall (R) is the number of true positives divided by the total number of items that actually
belong to that class. A high recall means that the majority of the „positive‟ items were labeled as
belonging to the class „positive‟.
True negatives are irrelevant items that we correctly identified as irrelevant. (negative
comments not classified under positive for positive class and vice versa )
F-measure is a measure that combines Recall and Precision into a single measure of
performance, this is just the product of Precision and Recall divided by their average.
F= (4-3)
In order to perform machine learning, it is necessary to extract clues from the text that may lead
to correct classification. So in this research, we are very interested to use simple, superficial
unigram feature vectors: all unigram words and most informative words in the corpus used as a
feature set. In the first stage, all bag-of-words of the corpus are used to perform experiments. The
same experiment is performed on different feature subset in the second stage, which comprises
the most informative features of the corpus: the distribution of each word over the different
output classes is calculated, and the words with the lowest entropy (or highest information gain)
are considered the most relevant features for the classifiers. To measure the frequency
distribution of a feature (in this case a word, punctuation and emojis figures are covered) over
the output classes is computed.
Classification works by learning from labeled feature sets, or training data. Text feature
extraction is the process of transforming what is essentially a list of words into a feature set that
is used by a classifier. The NLTK classifiers expect dict style feature sets, so we must therefore;
transform our text into a dict. The bag-of-words model is the simplest method and that constructs
a dict from the given words, where every word gets the value true where each word becomes a
key with the value True. In our study all unigrams in the file are transferred to dict, and consider
as a feature. In order to examine the applicability of machine learning algorithm to classify the
sarcasm detection model for Amharic text classification SVM, Random Forest and Neural
Network algorithms are compared with the same dataset and feature categories. Most of the
research in the area of sentiment analysis and sarcasm detection for text classification used 30%
of the data for testing and the remaining data are used for training. Based on the other researcher
methodologies we also apply 30% of the total data used for testing and the others for training for
our research works.
The support vector machine (SVM) is a supervised learning method that generates input-output
mapping functions from a set of labeled training data. The mapping function can be either a
classification function, i.e., the category of the input data, or a regression function. For
classification, nonlinear kernel functions are often used to transform input data into a high-
dimensional feature space in which the input data become more separable compared to the
original input space. Maximum-margin hyperplanes are then created. The model thus produced
depends on only a subset of the training data near the class boundaries. Similarly, the model
produced by Support Vector Regression ignores any training data that is sufficiently close to the
model prediction. SVMs are also said to belong to “kernel methods”. (Lipo Wang ed., 2005)
In the case of a Non-Sarcasm class, the text given a Non-Sarcasm classification is 88% likely to
be correct. This is a good precision leads to 12% of a false positive for the negative categories.
The f-measure of the Random Forest classifier with all unigram features for the class Non-
Sarcasm is 76%.
A random forest is a Meta estimator that fits a number of decision tree classifier on various sub-
samples of the dataset and uses averaging to improve the predictive accuracy and control over-
fitting. The sub-sample size is always the same as the original input sample size but the samples
are drawn with replacement.
During training, the Random forest Classifier creates a tree where the child nodes are also
instances of Random forest Classifier. The train () class method builds this tree from the ground
up, starting with the leaf nodes. It then refines itself to minimize the number of decisions needed
to get with a label by putting the most informative features at the top.
In the table 4-3 Decision Tree experimentation accuracy, precision and recall are presented. The
feature set used in this experiment is all unigram words (all bag-of-words.)
As it is shown in Table 4-3 above, we achieved an accuracy of 80.0% by using a decision tree
classifier. In a file given, Sarcasm classified class is 85% likely to be correct. High precision
causes only 15% false positive for the Sarcasm class relatively. The f-measure of the RF
classifier with all unigram features for the class Sarcasm is 81%.
In the case of a Non-Sarcasm class, the text given a Non-Sarcasm classification is 81% likely to
be correct. This is a good precision leads to 19% of a false positive for the negative categories.
The f-measure of the Random Forest classifier with all unigram features for the class Non-
Sarcasm is 80%.
Classification is one of the most active research and application area of neural networks. Neural
Networks are considered a robust classifier. The field of Neural Networks has arisen from
diverse sources, ranging from the fascination of mankind with understanding and emulating the
human brain, to broader issues of copying human abilities such as Classification, it is one of the
most frequently encountered decision making tasks of human activity. Classification is an
essential feature to separate large datasets into classes for the purpose of Rule generation,
Decision Making, Pattern recognition, Dimensionality Reduction, Data Mining etc. The Neural
networks have emerged as an important tool for classification.
The neural network is given the target outputs on to which it should map its inputs, i.e. it is given
in paired data of input and output. The error arising from the discrepancy between the network
output and the target is used to optimize the network parameters. Once the network has been
trained, it is used to produce an output for unseen data. In this research, we used multilayer feed
forward neural network (FFNN).
FFNNs are a kind of multilayer neural network which allows signals to travel one way only,
from input to output. First, the network is trained on a set of paired data to determine input-
output mapping. The weights of the connections between neurons are then fixed and the network
is used to determine the classifications of a new set of data. During classification, the signal at
the input units propagates all the way through the net to determine the activation values at all the
output units. Each input unit has an activation value that represents some feature external to the
net. Then, every input unit sends its activation value to each of the hidden units to which it is
connected. Each of these hidden units calculates its own activation value and this signal are then
passed on to output units. The activation value for each receiving unit is calculated according to
a simple activation function. The function sums together the contributions of all sending units,
where the contribution of a unit is defined as the weight of the connection between the sending
and receiving units multiplied by the sending unit's activation value. This sum is usually then
further modified, for example, by adjusting the activation sum to a value between 0 and 1 and/or
by setting the activation value to zero unless a threshold level for that sum is reached.
The third classification algorithm, we experimented with is the Feed-forward Artificial network
Classifier. We used the same train set and test set from the corpus that we constructed before,
including unigram features to a Neural Network by using Keras Deep Learning library for
Theano and TensorFlow that was not practical due to the high dimensionality of feature vectors
involved. We used a single hidden layer with 5 numbers of neurons. Because of the nature used
data we use this NN than the others like CNN and RNN
In the case of a Non-Sarcasm class, the text given a Non-Sarcasm classification is 77% likely to
be correct. This is a good precision leads to 23% of a false positive for the negative categories.
The f-measure of the Random Forest classifier with all unigram features for the class Non-
Sarcasm is 79%.
The RNN uses an architecture that is not dissimilar to the feed forward NN. The difference is
that the RNN introduce the concept of memory, and it exists in the form of a different type of
link. Unlike a feed forward NN, the outputs of some layers are fed back into the input of the
previous layer. This addition allows for the analysis of sequential data, which is something that
the Feed Forward NN is incapable of. Also, Feed Forward NN is limited to a fixed length input,
whereas the RNN has such restrictions. The inclusion of links between layers in the reverse
direction allows for feedback loops, which are used to help learn concept based on context. RNN
is applied successfully in many types of tasks. Some of this are image classification, Automatic
language translation, Natural Language processing such as sentiment analysis, text classification
etc. (EXEXXACT, 2019)
The defining features of the CNN are that it performs the convolution operation in certain layer-
Hence, the name Convolutional Neural Network. The architecture varies slightly from the feed
forward NN. In CNN, the first layer is always a convolutional layer. These are defined using the
three spatial dimensions: length, width, and depth. These layers are not fully connected meaning
that the neurons from one layer do not connect to each and every neuron in the following layer.
The output of the final convolution layer is the input to the first fully connected layer. The most
common application for CNN is in the general field of computer vision. Examples of this are
medical image analysis, image recognition, face detection and recognition system and full
motion video analysis.
Recurrent and Convolutional Neural Networks are common place in the field of deep learning.
RNN and CNN architecture has advantage and disadvantage that are dependent upon the type of
data that is being modeled. When choosing one framework over the other, or alternatively
creating a hybrid approach, the type of data and the job at hand are the most important points to
consider. (EXEXXACT, 2019)
100
90
80
70
60
SVM
50
ANN
40 RF
30
20
10
0
Accuracy Precision Recall F-Measure
Deciding on the Accuracy, Precision, Recall and F-values, we observed that the SVM classifier
(with a bag of words) performs better than the other classifiers. Including these unigram features
to a Neural Network by used Keras: Deep Learning library for Theano and TensorFlow was not
practical due to the high dimensionality of feature vectors involved. This was however practical
with the LibSVM library as in LibSVM: A Library for Support Vector Machines. The Random
Forest is trained using these unigram features by used sklearn library from python. Further
insights from the Feature Importance index (based on a decrease in Gini Impurity) reveal that
features pertaining to Punctuation (e.g. no of „!‟) , laughter expressions (e.g. 'ኪኪኪኪ', 'ሃሃሃ') and
emoticons are important in judging the sarcastic content from Amharic sentiment texts.
5.1 CONCLUTION
In this thesis, we have built a model for detecting sarcasm form Amharic sentiment text. The
model is able to process raw text as input and outputs whether the text is sarcasm or not. The
final model accuracy of 80.6%, 80.1 and 79% was obtained on the total collected datasets with
the Support Vector Machine, Neural Network and Random Forest classifier respectively. We
found some strong features that characterize sarcastic texts. From this, we can draw the
conclusion that the model worked for these specific data sets and could identify the majority of
the sarcasm texts.
In this research we conducted three different experiments to come up with a better performance.
The result of the three learning algorithms was presented with whole unigram words as a feature
and with informative features. Information gain was used as a feature selection technique to
choose the most informative words. As we can observe from the experiments made using
unigram words have a great impact in the classifier accuracy. The SVM algorithm with unigram
words outperforms all algorithms with 80.6% accuracy.
This research work has tried to go through the techniques of sentiment mining for sarcasm
detection in Amharic texts. To classify a given sarcasm text into predefined classes, the sarcasm
text passes through pre-processing, detection of sentiment words, weight assignment and polarity
classification processes. Pre-processing involves normalization and tokenization. The detection
of sentiment words is a process of detecting polarity words and contextual valence shifters based
on the sentiment lexicon. Weight assignment and polarity propagation is responsible for
assigning an initial weight for detected sentiment terms and propagating polarity value of
sentiment terms that are linked to contextual valence shifters. Polarity classification is concerned
with categorizing a given sarcasm texts into predefined categories based on the weights obtained
from the weight assign and polarity propagation process.
The results of the lexicon-based sentiment mining model for Amharic sarcasm detection model
using the processes explained above are encouraging. However, further work can be done to
improve the proposed model‟s results.
When looking at the experimental results, it is clear that the bag of the word is the most
discriminant feature for finding irony in the data sets that have been used. So this is important for
future work to test other features that we did together with the bag of word and vocabulary to see
if it can give the model a higher accuracy.
Researches in sentiment analysis in other language use linguistic resources like Thesaurus,
Lexicon Word Net, spelling checking, speech of tagger machine readable dictionaries and
machine translation software. This is a good idea to include it in the future work to facilitate
Sentiment analysis related research for Amharic text.
The performance of the classifier needs to be improved for designing more efficient applicable
system. We considered only unigram words as a feature. It would be interesting to investigate the
feature type like bigram words as a feature.
For Amharic language, there is no standardized corpus for sarcasm detection. So the construction
of a labeled corpus is very important because it would allow for more experiments, especially
with supervised classifications. This is a good idea to include it in the future work.
In all supervised approaches, reasonably high accuracy can be obtained subject only to the
source of the data that test data be similar to training data This dependency on annotated training
data is one major shortcoming of all supervised methods. Unsupervised approaches are
recommended as future work.
Sarcasm detection model by using another machine learning techniques can also be another
research work direction which is concerned with identification and extraction of sarcasm
comments and determining the sarcasm towards the given data sets.
Then after, we recommend that more research in the area should be done in this area. There is
still a long way until a proper Sarcasm detector could be used in general situations.
Future research should focus on the development of approaches analyzing the vocabulary used in
the Amharic sarcasm text in a deeper fashion. Our impression is that many sarcastic and ironic
Amharic texts use words and phrases which are non-typical for the specific domain or product
class. We propose that future research should focus on analyzing the specific vocabulary and
develop semantic similarity measures which we assume to be more promising than approaches
taking into account lexical approaches only.
Most work has been performed on text sets from one source like facebook, books, reviews, etc.
Some of the proposed features mentioned in this paper or previous publications are probably
transferable between text sources. However, this still need to be proven and further development
might be necessary to actually provide automated domain adaption for the area of irony and
sarcasm detection.
We approached the problem mainly from the data-driven perspective (annotation, feature
engineering, error analysis). There are also possible extensions to the lexical/morphological
features – either in the direction of semi-supervised learning and adding for example features
based on latent semantics, topic models, or graphical models popular in the sentiment analysis
field or the direction of deeper linguistic processing in terms of, e.g., syntax/dependency parsing
These deserve further investigation and are planned in future work.
Hence, it should be noted that the corpus is actually a mixture of ironic and sarcastic Amharic
texts. Irony and sarcasm are not fully exchangeable and can be assumed to have different
properties. Further investigations and analyses regarding the characteristics that can be
transferred are necessary
2. Abhijit Mishra, D. K. (2017). Harnessing Cognitive Features for Sarcasm Detection. Bombay ,
India: Indian Institute of Technology.
3. Abreham, G. (2014). OPINION MINING FROM AMHARIC ENTERTAINMENT TEXTS. Addis Ababa:
Addis Ababa University.
4. Aditya Joshi, V. T. (2016.). Are Word Embeddingbased Features for Sarcasm Detection? EMNLP
2016 .
5. Afework, Y. (2007). Automatic Amharic text categorization. Addis Ababa: Addis Ababa
university,computer science.
6. Alexander O’Neill. (2009). Sentiment Mining for Natural Language Documents. Canberra:
Australian National University.
8. Amir Silvio, B. C. (2016.). Modelling Context with User Embeddings for Sarcasm Detection in
Social Media. CoNLL 2016, 167.
9. Arti B., e. a. (2017). Opinion mining and Analysis: A survey. International journal on Natural
language computing.
10. B. Pang. (2008). Sentiment classification using machine learning techniques. in Proceedings of
the Conference on Empirical Methods in Natural Language Processing.
12. Belete, M. (2013). Sentiment Analysis for Amharic opinionated text. Addis Ababa, Ethiopia:
Ababa university.
13. Bender, L. a. (1976.). The Ethiopian Writing System. London: Oxford University Press.
14. Bing Liu. (2010). Sentiment analysis and subjectivity. In Bing Liu, Handbook of natural language
processing 2 (pp. 627–666). Chicago: Morgan & Claypool.
15. Camp., E. (2012). Sarcasm, Pretense, and the Semantics/Pragmatics Distinction. 587–634.
16. Chernet, Y. A. (2014). Political Satire in Abebe Tola’s “Yabe Tokichaw Shimutochi” and “Yabe
Tokichaw Mitsetochi” Essays. International Journal of Literature and Arts , 240-251.
17. Chun-Che Peng, M. L. (2015). Detecting Sarcasm in Text: An Obvious Solution to a Trivial
Problem. Writeup for Stanford CS 229 Machine Learning Final Project., 1.
20. Delia Iraz´u Herna´ndez Far´ıas, V. P. (2016). Irony Detection in Twitter: The Role of Affective
Content. ACM Trans. Internet Technol, 24 pages.
21. Ellen Rilo, A. Q. (2013). Sarcasm as Contrast between a Positive Sentiment and Negative
Situation. In EMNLP, 704–714.
22. Etstratios K., C. B. (2013). Ontology-based sentiment analysis of Twitter posts. Expert system
with applications, 4065-4074.
23. Filatova., E. (2012). Irony and Sarcasm: Corpus Generation and Analysis Using Crowdsourcing. In
LREC, 392-398.
24. Fred L. Drake, J. (2003). Python Tutorial Release 2.3.3. Python Software Foundation.
25. Gebremeskel, S. (2010). SENTIMENT MINING MODEL FOR OPINIONATED AMHARIC TEXTS. Addis
Ababa: MSC Thesis.
26. Gibbs., R. W. (1994). The poetics of mind: Figurative thought, language, and understanding.
London: Cambridge University press.
28. González-Ibánez, R. S. (2011). Identifying sarcasm in Twitter: a closer look. Proceedings of the
49th Annual Meeting of the Association for Computational Linguistics (pp. short papers-Volume
2). Association for Computational Linguistics.
29. Greenwood., D. M. (2014.). Who cares about sarcastic tweets? investigating the impact of
sarcasm on sentiment analysis. In Proceedings of LREC.
30. H Paul Grice, P. C. (1975). .Syntax and semantics. Logic and conversation 3(1975),, 41–58.
31. Hao, T. V. (2010). Detecting Ironic Intent in Creative Comparisons. In European Conference on,
765–770.
32. Inkpen, A. K. ( 2006.). Sentiment Classification of Movie and Product Reviews Using Contextual
Valence Shifters. Computational Intelligence,,, Volume 22.
33. JOSHI, A. B. (2017). Automatic Sarcasm Detection: A Survey. ACM Comput. Surv. 0, 0, Article
1000 ( 2017), 22 pages.
35. Katz., C. J. (1998). The differential role of ridicule in sarcasm and irony. In Metaphor and symbol
(pp. 1–15).
38. Liebrecht, C. K. (2013). The perfect solution for detecting sarcasm in tweets #not. In Proceedings
of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media,
(pp. pp. 29–37,).
39. Lin C. et. al. (2011). Sentence subjectivity detection with weakly-supervised learning. proceeding
of the 5th international joint conference on Natural language processing (pp. pages 1153-1161).
UK, london: University of Exter.
40. Lipo Wang ed. (2005). Support Vector Machines: Theory and Applications. Berlin: Springer.
41. Lunando, E. a. (2013). Indonesian social media sentiment analysis with sarcasm detection. In
2013 International Conference on Advanced Computer Scienceand Information Systems
(ICACSIS),.
42. Michie, D. S. (1994). Machine Learning, Neural and Statistical Classification. Broke Books.
45. Oren Tsur, D. D. (2010). Semi-Supervised Recognition of Sarcastic Sentences in Online Product
Reviews. ICWSM-A Great Catchy Name.
46. Pedregosa, F. V. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning
Research, 2825–2830.
47. Peng Liu, W. C. (2014). Sarcasm Detection in Social Media Based on Imbalanced Classification. In
Web-Age Information Management, 459-471.
48. Pexman, S. L. (2003). Context incongruity and irony processing. Discourse Processes 35, 241–
279.
49. Pt´aˇcek, T. H. (2014). Sarcasm detection on czech and english twitter. In The 25th International
ConConference on Computational Linguistics.
52. Raymond W Gibbs., J. G. (1994). poetics of mind: Figurative thought, language, and
understanding. London, Cambridge : Cambridge University Press.
53. Reyes, A. R. (2013). A multidimensional approach for detecting irony in twitter. Language
Resources and Evaluation, 239–268.
54. Rosso, A. R. (2014). On the difficulty of automatically detecting irony: beyond a simple case of
negation. Knowledge and Information Systems, 595-614.
55. Santosh Kumar Bharti, K. S. (2015). Parsing-based Sarcasm Sentiment Recognition in Twitter
Data. In Proceedings of the 2015 IEEE, 1373–1380.
56. Santosh Kumar Bharti, K. S. (2015). Parsing-based Sarcasm Sentiment Recognition in Twitter
Data. . In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social
Networks Analysis and Mining 2015 (pp. 1373–1380). ACM.
57. schrauwen, S. (2010.). Machine learning approach to sentiment analysis using the Dutch Netlog
corpus. Computational linguistics and psycholinguistic technical report series(CLIPS).
58. Sebsibeet. (2004). Unit selection for Amharic using FESTVOX,5th ISCA speech synthesis
workshop. Language technology research center.
59. Solomon Teferra Abate, W. M. (2005). An Amharic speech corpus for large vocabulary
continuous speech recognition. Ninth European onference on speech communication and
technology. France: isca-speech.org.
60. Soujanya Poria, E. C. (2016). A deeper look into sarcastic Tweets using deep convolutional neural
networks. arXiv preprint arXiv:1610.08815.
61. Stefan Stieger, A. K. (2011). Humor styles and their relationship to explicit and implicit self-
esteem. In Personality and Individual Differences 50 (pp. 747-750).
63. Tadesse, B. (1994). The Ethiopian Writing System. Paper presented at the 12th International
Conference of Ethiopian Studies. Michigan : Michigan State University.
64. Varela, P. d. (2012). Sentiment Analysis. Indian journal of computer science and Engineering.
65. Veale., A. G. (2016). Fracking Sarcasm using Neural Network. WASSA NAACL (2016).
66. Walk, S. L. (2013). Really? well. apparently bootstrapping improves the performance of sarcasm
and nastiness classifiers for online dialogue. In Proceedings of the Workshop on Language
Analysis in Social, (pp. 30–40).
68. woldekirkos, M. (1934.). “አማርኛ ሰዋሰው. Addis Ababa: Berhanenaselam printing press.
69. Xiaoying. (2009.). Categorizing Terms’ Subjectivity and Polarity Manually for Opinion Mining in
Chinese. IEEE.