0% found this document useful (0 votes)

29 views12 pages

Hate Is in The Air! But Where? Introducing An Algorithm To Detect Hate Speech in Digital Microenvironments

Uploaded by

Mario Santisteban

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views12 pages

Hate Is in The Air! But Where? Introducing An Algorithm To Detect Hate Speech in Digital Microenvironments

Uploaded by

Mario Santisteban

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Miró‑Llinares et al.

Crime Sci (2018) 7:15

https://doi.org/10.1186/s40163-018-0089-1

RESEARCH Open Access

Hate is in the air! But where? Introducing

an algorithm to detect hate speech in digital
microenvironments
Fernando Miró‑Llinares1, Asier Moneva1* and Miriam Esteve2

Abstract
With the objective of facilitating and reducing analysis tasks undergone by law enforcement agencies and service
providers, and using a sample of digital messages (i.e., tweets) sent via Twitter following the June 2017 London Bridge
terror attack (N = 200,880), the present study introduces a new algorithm designed to detect hate speech messages
in cyberspace. Unlike traditional designs based on semantic and syntactic approaches, the algorithm hereby imple‑
mented feeds solely on metadata, achieving high level of precision. Through the application of the machine learning
classification technique Random Forests, our analysis indicates that metadata associated with the interaction and
structure of tweets are especially relevant to identify the content they contain. However, metadata of Twitter accounts
are less useful in the classification process. Collectively, findings from the current study allow us to demonstrate how
digital microenvironment patterns defined by metadata can be used to create a computer algorithm capable of
detecting online hate speech. The application of the algorithm and the direction of future research in this area are
discussed.
Keywords: Metadata, Cyber place, Hate speech, Twitter, Random Forests

Introduction of the spatiotemporal barriers that limit the transmission

Moments after Khuram Shazad Butt used a van to run of knowledge in physical space have augured an exponen-
down pedestrians along the London Bridge, Twitter tial increase both in the number of potential diffusers of
was boiling. At 22:01,1 before the first call for help was such types of content and its receivers (Levin 2002). Such
received, the hashtag #PrayForLondon was trending2 quantitative growth, however, has taken place simulta-
on a global level; 2 min later, the first message including neously with an even more relevant qualitative change.
the hashtag #StopIslam was posted; and an hour later, The democratisation of electronic communications and
18 million tweets with the hashtag #LondonBridge had technologies (Brenner 2017) and, in particular, the emer-
been published. In all of these digital messages, users gence of social networks as a brand-new social interrela-
expressed solidarity and indignation over the attack. tion environment that has normalised communications
Unfortunately, some digital content also contained mes- through instant messaging systems has created a window
sages of happiness, hatred towards certain groups, and of opportunity in which the expression of violent mes-
the glorification of violence. sages is no longer hidden or considered uncharacteristic
Academic interest inherent in the impact of hate of an ideological or political discussion.
speech on the Internet is not new (Tsesis 2001). The pos- We reconceptualize the role social networks play in the
sibilities of cyberspace to unify users and tear down some production of criminal events (e.g. hate speech) based

*Correspondence: amoneva@crimina.es
1
CRÍMINA Research Center for the Study and Prevention of Crime,
1
Miguel Hernández University of Elche, Avda. de la Universidad, s/n, Hélike Time in London.
building, 03201 Elche (Alicante), Spain 2
A topic is considered trending in Twitter when it is popular in a specific
Full list of author information is available at the end of the article location at a given moment.

© The Author(s) 2018. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License
(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium,
provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license,
and indicate if changes were made.
Miró‑Llinares et al. Crime Sci (2018) 7:15 Page 2 of 12

on an adaptation of the principles of Criminology of 2016) as well as to identify individuals in the process of
Place to cyberspace (Miró-Llinares and Johnson 2018). radicalising (Edwards 2017).
The present paper addresses the potentially massive dis- Such tools for the early detection of radical content
semination of radicalized content via Twitter through the are based on the identification of patterns, but in order
introduction of an algorithm for the automatic detection to achieve this aim, they utilise a variety of techniques of
of contents that contribute to mitigate their impact. This content analysis, including the following: (1) manual col-
research demonstrates how patterns of hate speech can lection (Gerstendfeld et al. 2003), and sampling methods
be detected in metadata,3 basing the analysis on the rela- and crowdsourcing (Chatzakou et al. 2017; Magdy et al.
tion between crime and place (Eck and Weisburd 1995; 2015); (2) systematic keyword searches (Décary-Hétu
Sherman et al. 1989). Cyberspace, however, is not con- and Morselli 2011); (3) data mining for sentiment anal-
tained in a single “place” with homogeneous characteris- ysis (Cheong and Lee 2011); (4) natural language pro-
tics, but events occur in different cyber places inside of it cessing (Nobata et al. 2016); and (5) different machine
and at different times (Miró-Llinares and Johnson 2018). learning procedures (Ashcroft et al. 2015; Burnap and
The identification of these spatiotemporal patterns may Williams 2015; Malmasi and Zampieri 2017; Sharma
help us to improve the algorithms based solely on con- et al. 2018), including logistic regression models (David-
tent analysis. This method adds to quantitative efficiency son et al. 2017), and neural networks (Djuric et al. 2015;
by automatizing part of the analytic process and thereby Dos Santos and Gatti 2014) or. Although some of these
reducing the complexity of content analysis needed to tools employ metadata analysis in combination with
identify messages of hate speech. Furthermore, it adds to semantic or syntactic methods (Schmidt and Wiegand
qualitative efficiency by increasing the ability to limit the 2017; Waseem and Hovy 2016), all of them focus their
attention on content by private entities or public authori- attention at the core of the analysis on the content of the
ties to content that is actually related to high-risk activi- message, meaning the words themselves or the relations
ties, that is the dissemination of hatred or radical content among them, which implies a major drawback when ana-
in cyberspace. lysing communicative environments as dynamic as social
In the following section, a review of recent literature is networks (Serra et al. 2017). To overcome these diffi-
conducted to summarise the existing approaches to hate culties when analysing online hate speech, in this paper
speech detection in cyberspace. Then, a comprehensive we focus instead on analysing the metadata features
explanation of the concept of “cyber place” based on the extracted from Twitter digital microenvironments that
idea of convergence is provided to present the theoreti- are relevant for hate speech dissemination.
cal framework in which the algorithm is built on. After-
wards, an empirical study is reported on to show the Traditional microenvironments, digital
performance of the system proposed with a sample of microenvironments, and hate speech
tweets. The results are then interpreted and discussed Twitter, like other social networks, is not a concrete phys-
in terms of efficiency and innovation to conclude with a ical location but can be accessed from many places, and
summary of the relevant contributions and developments criminal microenvironments are usually thought of as
this work provides. the locations, places, or spaces where crimes occur. Tra-
ditionally, the analysis of these micro places has served
Related work the purpose to understand how convergence allowed for
There has been a normalisation of extreme situations a criminal event to take place. Social networks are not
in an environment visited daily by millions of users to places in the traditional geographic sense, but they are
obtain the latest news and to socialise that is also used for places in a relational sense, since they are environments
propaganda purposes and the recruitment of radicalised “that are visited” in which people converge with other
subjects (Berger and Morgan 2015). This situation has led people and with content in different ways, depending
European authorities who were already focused on social on the characteristics of the particular digital environ-
control (McGuire 2017) to increase social media surveil- ment or network. The combination of the people (i.e.,
lance and specially to create and use digital tools that accounts), who say things (i.e., tweets) to other people
employ complex algorithms to detect propaganda and (i.e., other accounts), define unique digital microenviron-
extremist and hate speech content (Awan and Blakemore ments in cyberspace. Indeed, it is in this sense of “place”
where some cybercrimes occur in certain digital places
more often than in others (Miró-Llinares and Johnson
2018), which implies that the basic premises of environ-
3
mental criminology in general, and crime patterns in par-
The information that defines single data items (e.g., the number of times a
tweet has been retweeted, or the number of followers an account has).
ticular, may be true for certain cybercrimes.
Miró‑Llinares et al. Crime Sci (2018) 7:15 Page 3 of 12

In particular, this approach refers to the idea that crime with regard to withdrawing offensive, hate-related or
distribution is not random but is based on patterns deter- radical content (Weimann 2014), either because of the
mined by the different environmental elements of the technical ease involved in creating accounts and the
places where victims and offenders converge and by the immediate publication of tweets or because of its rather
relevance of such places to the routine activities devel- vague free speech policy, which makes requests for
oped in the activity spaces (Brantingham and Branting- removal different in each country (Hsia 2017).
ham 1981). This is similarly valid for hate speech and for However, Twitter is not a homogeneous place where
similar behaviours such as the dissemination of terrorist everything occurs in the same way everywhere inside it.
propaganda and radicalisation messages. It is true that It is well known, for example, that the temporal distribu-
in these types of crimes, the relevant convergence is not tion of messages does not occur randomly (Miró-Llinares
occurring between offender and victim but between the and Rodríguez-Sala 2016); that there are some profiles
sender and receiver of the message. However, the conver- with more followers than others and that not all of them
gence remains necessary: it needs a place where the hate publish the same number of tweets (Lara-Cabrera et al.
message is reflected, and where another (or others, as the 2017); and that there are very different degrees of identity
quantity of receivers is irrelevant) perceives it, such that expression on this social network (Peddinti et al. 2014).
hate speech or radicalisation on the internet will occur This indicates that a microanalysis of the configural ele-
in some places more frequently than in others at both ments of digital microplaces may be helpful to detect the
macro and micro levels, given certain environmental environmental patterns that determine the occurrence
parameters. of an event. In addition, it seems similarly obvious that
From a macro perspective, that is, in comparison with the micro units that are essential for such an analysis are
other “places” or social networks, Twitter is an envi- accounts and tweets.
ronment of massive, interactive and immediate com- A tweet is the essential microplace because it is where a
munication of content. Although it allows streaming message is expressed and shown and is where other users
communication (through Periscope) and direct messages can interact with it, while an account is the microplace
to concrete users out of sight of the rest of network, Twit- from which the publication or the viewing of such mes-
ter works essentially as a public square in which stored sages is made available. Like every microplace, a Twit-
and forward communication is used to express content ter account has certain characteristics that differentiate
that can be observed and shared by a large number of it from the rest. For instance, if an account’s registra-
people (Marwick and Boyd 2011). If we add that political tion information coincides with the identity of a public
or ideological communication has become increasingly personality, Twitter will verify the user account with a
frequent on Twitter (Bode and Dalrymple 2016), it seems blue badge. At the same time, a user can include a brief
understandable that this social network is commonly personal biography in one’s profile and even activate an
used to disseminate hate speech (Schmidt and Wiegand option to geolocate tweets in such a way that when pub-
2017) and that it has become perhaps the favourite social lishing a message, the geographic location of where the
network of extremist and terrorist groups for propaganda tweet was written can be attached. Furthermore, users
and the promotion of radicalisation to a wider audience can include other accounts in thematic groups called
(Berger and Morgan 2015; Veilleux-Lepage 2014; Wei- “lists”, which are useful for seeing only those messages
mann 2014). published by selected accounts in chronological order.
In addition, Twitter’s structural configuration, in par- The number of lists in which an account is included is
ticular the restriction on the length of messages (first 140 reflected in its profile together with other parameters
characters, now 280), limits the possibilities for interac- such as the number of tweets published, the number of
tion among users and makes both hate speech, which will tweets liked, and the number of followers as well as the
not be the same as the content expressed in a different number of users that the account follows.
forum or on Facebook (Awan 2016), and the activities of Similarly, a variety of elements configure and define
radicals and terrorists based on such speech less focused a message transmitted by tweet. Tweets have a struc-
on recruitment and more aimed at normalising and mag- tural limitation in relation to the extension of their
nifying terrorist activity for soft sympathisers (Veilleux- content that permits only a maximum number of
Lepage 2014) as well as disseminating propaganda by characters, whether alphanumeric or in the shape
redirecting users to other places in cyberspace (Weimann of small icons, known as emojis. The combination of
2014). Furthermore, Twitter allows anonymity, although these characters with a variety of other elements will
it is not the most common way of interacting (see Ped- define the content of the microplace and its scope.
dinti et al. 2014). Finally, despite its constant technical Such elements include mentions, which act as spe-
modifications, Twitter has not shown much efficiency cific personal notification when they include the @
Miró‑Llinares et al. Crime Sci (2018) 7:15 Page 4 of 12

symbol before the name of the user; Uniform Resource network’s microplaces are decisive with regard to deter-
Locators (URL), which allow the inclusion of a hyper- mining the types of messages that will be published
link to additional content, whether an image, a video, from or inside them. With the aim of finding a more
a GIF or a link to an external site; or hashtags, which efficient tweet classification criterion, two classification
are situational elements that serve to thematically tag trees were implemented: one with account metadata as
the content of a tweet to connect messages and create inputs and another with the tweet microplace’s meta-
communicative trends. Indeed, the result of combin- data. A detailed description of the sampling strategy,
ing all these elements conditions the ways and the fre- variables analysed, and analytic technique follows.
quency with which people interact with a tweet just by
seeing it or by interacting with the message and pro- Sample and procedure
moting its dissemination through a retweet, which is The data collection was performed through the Appli-
a feature that allows the dissemination of messages to cation Programming Interface (API) of Twitter, which
the followers of an account. allows users with developer permissions access to data
In any case, the relevance of the microplaces where for reading, writing or monitoring in real-time. Research-
more or less hatred can be found lies in the premise ers that work with data from Twitter are already famil-
that motivates the present work: that hate speech, iar with the constant changes experienced by their API,
similar to other crimes in physical spaces and in cyber- which may compromise the process of data gathering.
space (Miró-Llinares and Johnson 2018), will also be To address this problem and to overcome the possible
distributed in certain patterns conditioned by the changes caused by the application, an algorithm for data
characteristics of the digital microenvironments where gathering was developed (see Additional file 1: Appen-
they occur. Thus, with regard to the special nature of dix A) that is equipped with sufficient rigidity due to
hate speech in the sense of its dissemination via Twit- an exception management system: programming tech-
ter and taking into consideration the different struc- niques that enable researchers to control the appearance
tural characteristics of the microplaces that integrate of anomalies during the execution of a script. Addition-
it, there exists an opportunity to detect environmen- ally, a system was implemented that provides immediate
tal patterns related to hate speech that could help to alerts if the server experiences any problems, the con-
detect its early appearance in order to prevent, control nection is interrupted, or the API loses or receives new
or mitigate its impact. permissions. Through this system, it is possible to quickly
resolve any adjustment problems regarding the requests
sent to the server via the code and the responses from the
The present study API when new updates modifying the composition of the
The present study introduces and evaluates a new algo- dataset occur.
rithm, designed to detect hate speech, through the iden- Once the API access is obtained and after establish-
tification of patterns found in the situational metadata ing convenient authentication parameters, information
of digital messages. Existing research has discovered about a concrete event can be collected for subsequent
various types of patterns on Twitter: linguistic and tem- analysis by using certain keywords or hashtags as search
poral (Williams and Burnap 2015), sociodemographic criteria. In this case, the terrorist attack perpetrated on
and temporal (Marcum et al. 2012), spatiotemporal and London Bridge on 3 June 2017 has been selected. Once
socioeconomic (Li et al. 2013) and sociodemographic the data collection process has begun, the API can store
(Sloan et al. 2015), among others. In addition, patterns up to 1% of the tweets published on Twitter based on
have been found related to the metadata on other social pre-set search criteria. Thus, three filtering hashtags
networks: for example, those linked to certain content were selected to provide balanced sampling (see Miró-
for the detection of cyberbullying on Instagram (Hos- Llinares 2016): #LondonBridge, which refers neutrally to
seinmardi et al. 2015), or the tagging of YouTube videos the event; #PrayForLondon, for solidarity content; and
to identify deviant content (Agarwal et al. 2017). What #StopIslam, which is a representative hashtag for radi-
has not yet been analysed, however, is whether such cal expressions, Islamophobia in this case. The first two
patterns are related to the environmental characteris- hashtags were trending topics at some point during the
tics of the social media accounts and digital messages event, while the last one was also a trending topic during
in relation to their configuration as microplaces. previous attacks, allowing us to make comparisons with
To achieve the study’s aim, we required a large sam- other samples collected earlier. Through this procedure,
ple of digital messages from Twitter, upon which data over 3 days, a sample of more than 200,000 tweets was
mining techniques could be applied. This would enable obtained (N = 200,880) that refer directly or indirectly to
us to determine whether characteristics of this social the selected event.
Miró‑Llinares et al. Crime Sci (2018) 7:15 Page 5 of 12

Independent variables: microplace characteristics dissemination. Table 1 provides a detailed description

In addition to the content of the tweets, the semi-struc- of the variables related to the anonymity and visibility of
tured dataset [in JavaScript Object Notation (JSON) the accounts that were used in the present study. Those
format] contains numerous fields that provide infor- variables that provide information about the person
mation on different elements of Twitter, including the behind the profile, such as their name, interests, or area
microplaces of accounts and tweets. Once the dataset of residence were included within the anonymity cat-
was pre-processed and high-value dispersion variables egory. A second set of variables measuring the visibility
were eliminated together with record identifiers as well of the users’ activity in Twitter such as message posting,
as those variables with a percentage of nulls higher than the user’s active period on the social network, and differ-
25–30% (Hernández et al. 2004), the dataset was built. ent forms of interaction with other users were included
To build the dataset on which the classification tree was within the visibility category. Regarding the character-
applied, there has been selected, on one hand, those vari- istics of an account, the variable “description” has been
ables that are related to the anonymity and the visibility modified because the API returned the entire text field
of accounts and, on the other hand, to the structure and of users’ biographies, and since the analysis of its content
interaction of the tweets. These variables and others that would have implied a subjective interpretation, a dichot-
were created from the aforementioned, together with omisation was applied (1, the user has a biography; 0, the
each observation (i.e. tweet), comprise the dataset ana- user does not have a biography) to enable the classifica-
lysed in the present study. tion tree to operate with these data.
The users’ account has been identified as a micro- Tweets themselves and their associated metadata
place intimately related to their anonymity and the vis- have also been identified as potential predictors of hate
ibility of their actions, hence relevant for hate speech speech dissemination. Some of these elements are related
to the interaction a tweet generates, while others deter-
mine its structure. Within the interaction category, some
Table 1 Account variables related to users’ anonymity interactive elements that favour the users’ engagement
and visibility. Source: https://developer.twitter.com/ in dissemination activities were included together with
en/docs/tweet s /data-dicti o nary / overv i ew/user-objec t.
the timing of the tweet publication. The structure cat-
Accessed 13 July 2018
egory comprises two variables that constrain the length
Variable Type Description of the text and consequently the content of the message.
Anonymity The group of variables from the microplace of a tweet is
Verified Boolean When true, indicates that the shown in Table 2. Regarding these elements, a few modi-
user has a verified account fications have been made (see Additional file 1: Appendix
Descriptiona Boolean When true, indicates that the B). Because the restriction on the number of characters
user has included a biog‑ when publishing a tweet is one of the most distinctive
raphy in his or her account
profile characteristics of Twitter that has an obvious communi-
Geoenabled Boolean When true, indicates that
cative impact, we measured the length of the text in the
the user has enabled the messages in the sample. To this effect, short scripts were
possibility of geotagging elaborated to identify both the codification of the emo-
their tweets
jis on Twitter and the character chains composing URL
Visibility
to subsequently extract them from the body of a mes-
Day_count Numeric The number of days since the
user account was created
sage. Thus, it is possible to carry out a character count to
Listed_count Numeric The number of public lists
determine the actual length of a message, and two new
in which this user is a variables are used to measure the presence of emojis and
member URL. With a similar method, we were able to determine
Statuses_count Numeric The number of Tweets the number of mentions and hashtags in each message,
(including retweets) issued
by the user
and we codified the results using two more numerical
Followers_count Numeric The number of followers the
variables.
account currently has
Friends_count Numeric The number of users the Dependent variable: hate speech
account is following. Also
known as followings With regard to the dependent variable, a tailored read-
Favourites_count Numeric The number of tweets ing and the subsequent dichotomisation were carried
the user has liked in the out to determine whether the content of each tweet was
account’s lifetime neutral or hate speech. This method was chosen over
a
New variables semantic or syntactic approaches (e.g., Bag of Words)
Miró‑Llinares et al. Crime Sci (2018) 7:15 Page 6 of 12

Table 2 Tweet variables related to the interaction from this method. To alleviate the effect of judges’ sub-
and the structure of messages. Source: https://developer. jective analysis of the messages (n = 100), the Kappa
twitte r.com/en/docs/tweet s/data-dicti onary /overv iew/ coefficient (Cohen 1960), which measures the degree
tweet-object. Accessed 13 July 2018 of agreement, was applied to ensure accordance in the
Variable Type Description assessments and thus the reliability of the classification of
the tweets. As can be observed in Table 3, and accord-
Interaction
ing to the criteria established by Landis and Koch (1977),
Mention_counta Numeric Number of mentions included
in the text of the tweet
“almost perfect” (p. 165) agreement was obtained among
Hashtag_counta Numeric Number of hashtags included
the three pairs of judges (0.81–0.89).
in the text of the tweet Although previous studies that used the same clas-
Urla Boolean When true, indicates that the sification methodology removed all retweets from the
tweet includes a URL sample to filter original messages from their redun-
Retweet_count Numeric Number of times this tweet dant replicas (Esteve et al. 2018; Miró-Llinares 2016;
has been retweeted
Miró-Llinares and Rodríguez-Sala 2016), this procedure
Minute_count Numeric Number of minutes since the
event happened and the
was not adequate in this study because the data collec-
tweet was issued tion method through the API did not guarantee that all
Structure retweets fit the original tweets that bounced back. Thus,
Text_counta Numeric Number of characters in the only duplicated tweets were removed, which left 35,433
message, excluding URL, remaining unique cases to be classified. After the judges
emoji, and retweet structure classified these messages, duplicates were folded back
characters (i.e., ‘RT @user‑
name’) into the dataset to calculate the hate speech prevalence in
Emojia Boolean Indicates whether the text of our sample: a total of 9488 (4.7%) out of 200,880 tweets.
the tweet includes an emoji
a
New variables Analytical strategy
Regarding the characteristics of the sample, to confirm
because these have shown weaknesses when dealing with the relevance of places in cyberspace, it is necessary to
specific messages such as humour or irony (Farías et al. apply data mining techniques. Therefore, by making use
2016; Reyes et al. 2013). Plenty of investigations have of the Random Forests classifier technique (Breiman
addressed the problem of hate speech detection in social 2001), an algorithm was implemented to create a number
networks with such methodologies (e.g., Burnap and Wil- of classifiers for tweets that divide the sample based on
liams 2015, in Twitter; Mariconti et al. 2018, in YouTube). the filters generated by each of the variables included in
Although there exists a profound dogmatic discussion in the model (i.e., nodes). These classifiers grow from a ran-
that regard, in the present study, a broad concept of hate domized data set extracted from the main sample to train
speech was used to classify such messages that comprises the model and fit its parameters. 70% of the sample com-
all the expressions considered violent or hateful commu- prises the training set and the remaining 30% constitutes
nication in the taxonomy elaborated by Miró-Llinares the test set. This division was repeated 10 times to pro-
(2016). According to this classification, for a tweet to mote randomization. The training set was then balanced
be considered hate speech, its content must include the favouring the minority class (i.e., hate speech tweets),
following categories: (1) direct incitement/threat of vio- while the remaining data were included within the unbal-
lence, (2) glorification of physical violence, (3) an attack anced test set (Table 4).
on honour and human dignity, (4) incitement to discrimi- This training and testing process allow to control for
nation/hate and (5) an offense to the collective sensitivity. anomalous or less consistent nodes and, hence, growing
This classification task was therefore based on the subjec- a non-overfitted, pruned tree. To define the most appro-
tive interpretation of a text, with the limitations derived priate parameters for our algorithm, a series of compu-
tational experiments were carried out. These parameters
were adjusted to reduce the forest’s sensitivity to their
Table 3 Results of the applications of the Kappa value (Tuffery 2011).
coefficient to the three pairs of judges When going through each node, the model asks each
classifier whether the sample fulfils the condition estab-
Group Value of κ
lished on it, thereby filtering the main sample and creat-
Judges A and B 0.81 ing two subsamples: one that fulfils the condition and one
Judges A and C 0.89 that does not. The model then selects the best filtering
Judges B and C 0.88 among all trees and averages their individual estimations
Miró‑Llinares et al. Crime Sci (2018) 7:15 Page 7 of 12

Table 4 Training set and test set composition Table 5 Algorithm maximum precision and validation
scores according to account and tweet models
Class Training set Test set
Model Precision Recall F1-score Fivefold
Neutral 6638 184,754
Hate speech 6638 2850 Account
Total 13,276 187,604 Neutral 0.99 0.65 0.79
Hate speech 0.03 0.62 0.05
Average/total 0.98 0.65 0.78 0.63
to produce the final output. By creating several decision Tweet
trees that learn from a predetermined training set, the Neutral 1.00 0.87 0.93
Random Forest produces robust predictions. When the Hate speech 0.09 0.86 0.17
condition that defines a node reaches maximum clas- Average/total 0.98 0.87 0.92 0.86
sifying efficiency, it means that the model has reached a
Parameters: number of estimators = 1000; maximum depth = 10
leaf node, and it classifies the corresponding subsample
to the same class: hate speech or neutral content. This
technique intends to demonstrate that the cyber place
variables selected can be used to properly classify a part Overall, the ability to avoid false positives (i.e., Precision)
of the sample, thereby contributing to the automation of is consistently higher when including tweet variables
the process. Additionally, to avoid results to be positively in the algorithm. Regarding the accuracy of the model,

we used к-fold cross validation defining к = 5 subsamples

or negatively influenced by the training set composition, results also support the use of tweet metadata over
account metadata when it comes to the correct classifica-
(Kuhn and Johnson 2013). tion of positive cases (i.e., Recall). Mean scores resulting
An overview of the methodology employed in the pre- from fivefold validation are also included.
sent paper can be found in the figure below (Fig. 1). More detailed information about the number of cor-
rectly and incorrectly classified messages for both models
Results can be found in the resulting confusion matrix (Table 6).
As can be observed in Table 5, two classification mod- Attending to the final purpose of the algorithm, effort
els were implemented and then validated for each set was put into reducing the incorrect classification of hate
of cyber place variables to classify our sample: one used speech messages (i.e., false negatives).
account variables as predictors while the other used Regarding the cyber place related variables used to
tweet variables. Since the vast majority of accounts classify the messages, Table 7 shows their specific rel-
issued a single message (Min = 1.0; Q1 = 1.0; Mdn = 1.0; evance within the models. The importance score reflects
M = 1.3; Q3 = 1.0; Max = 126), their associated meta- the proportion of nodes that include a condition imposed
data can be treated differently and therefore the perfor- by each of the variables listed. In the case of account
mance of the algorithm between the two models can be metadata, results show that visibility related variables are
compared. Whereas account variables related to visibil- more important for the output decision, while anonym-
ity and anonymity of users produce a rather poor model ity has a negligible impact. On the other hand, two tweet
performance, the variables related to interaction and the variables influence the decision process over the rest:
structure of the tweets produce very promising results. the number of retweets under the interaction category

Fig. 1 Overview of the methodology employed

Miró‑Llinares et al. Crime Sci (2018) 7:15 Page 8 of 12

Table 6 Confusion matrixes according to account to have low influence in the decision process (Table 7),
and tweet models they are crucial to define the content of the messages.
Model Real Prediction In summary, and as shown in the previous graph for
the analysed sample, it is possible to define the environ-
Neutral Hate speech
mental conditions that Twitter microplaces should have
Account Neutral 120,511 64,243 in order to differentiate the type of event occurring in
Hate speech 1078 1772 them with certainty. These figures allow us to interpret
Tweet Neutral 160,676 24,078 the environmental patterns that arise from the sequential
Hate speech 397 2453 combination of account and tweet metadata associated to
concrete messages. For example, if a message in our sam-
ple received between 6907 and 8138 retweets, was pub-
lished 262 min after the attack, and had a text length of
(importance = 0.41), and the length of the text associated
more than 107 characters (140 characters was the maxi-
to the structure of the message (importance = 0.34).
mum allowed at the time of sampling), it was classified
To further understand which specific conditions a
as a hate speech message; otherwise, it was classified as
message must meet to be classified as neutral or hate
neutral (see Fig. 2).
speech by the algorithm, one of the decision trees pro-
duced with the Random Forests has been randomly
selected and transformed into a flow chart (Fig. 2). As Discussion
can be observed, the metadata patterns described by Based on the results of the present study, we can deduce
hate speech messages are different from those depicted that (1) digital microenvironment metadata can be used
by neutral communication. This flowchart shows some to detect hate speech patterns in cyberspace similar to
contents that describe clear patterns and can be classi- the way spatiotemporal crime patterns in the physical
fied using only one to three variables: retweet count, text environment can be found, and that (2) hate speech mes-
count, and minute count. Even if temporal stamps appear sages on Twitter describe environmental patterns that
are different from neutral messages. This result is derived
from the fact that hate speech messages are communi-
cated via tweets, or through accounts, with specific envi-
Table 7 Importance of the variables included ronmental characteristics reflected in concrete metadata
in both models
associated with the message. In other words, tweets and
Variable Importance accounts containing hate speech have different charac-
Account
teristics from tweets and accounts containing neutral
Anonymity
messages, which is a logical consequence of the different
  Verified 0.00
ways of communication currently available and messages
that are expressed differently by taking advantage of the
  Description 0.02
different possibilities of the digital environment.
  Geoenabled 0.05
The performance of the models reported on in this
Visibility
paper demonstrate that not all account variables related
  Day_count 0.16
to the anonymity and visibility of users are relevant cri-
  Listed_count 0.12
teria to distinguish whether or not the content of a tweet
  Statuses_count 0.17
is hate speech. This is perhaps due to the ease in proving
  Followers_count 0.14
them fake as an identifier element, and therefore, they are
  Friends_count 0.16
not relevant for differentiating between messages. More
  Favourites_count 0.17
specifically, anonymity related variables have proven to
Tweet
be almost irrelevant for classification purposes, prob-
Interaction
ably conditioned by their dichotomous categorization
  Mention_count 0.02
as the information gain is biased towards variables with
  Hashtag_count 0.08
large number of values (Quinlan 1986). Additionally, it
  Url 0.05
does not seem entirely correct to make use of variables
  Retweet_count 0.41
that describe a place where a crime will not occur just to
  Minute_count 0.08
determine the optimal environmental characteristics. As
Structure
a matter of fact, the account is the microplace from which
  Text_count 0.34
hate speech is published, but it is not where it manifests.
  Emoji 0.02
In other words, in the present analysis, we are using the
Miró‑Llinares et al. Crime Sci (2018) 7:15 Page 9 of 12

no hashtag no
text Undetermined
no > 4.5
> 74

minute no
yes
> 297
yes
retweet
no
> 6907 yes

no
retweet
Tweet
> 8138 minute
yes
> 262 no Neutral
text
yes
> 107

yes
Hate speech

URL
yes no
> 0.5

yes text
> 107 yes

Fig. 2 Flowchart for a Random Forest classification tree according to the variables of the tweet (depth = 5)

characteristics of houses to define the context of a crime study shows that the architecture of a tweet, especially
that occurs on that street. For this reason, we argue that the length of its text, is an essential element to deter-
the results are far from expected. We also believe that mine the nature of the message. In line with previ-
account metadata are not useful for classifying tweets ous research, tweet time stamps have shown that hate
because such data are associated with a dichotomised speech messages also cluster in time (Miró-Llinares
result of a particular tweet, and in this way, we might be and Rodríguez-Sala 2016), suggesting that certain cues
incorrectly attributing radical characteristics to a not-so- activate radical responses on individuals more than
radical place, such as an account that might have pub- others do. However, this analytical approach seems
lished just one hateful message. It seems reasonable to insufficient to explain why this is the case. In addition,
conclude that the intention of a user who posts a single the results confirm that tweet metadata have proved
hate speech message cannot be considered the same as a especially relevant to automatically identifying the spe-
radical user who systematically disseminates hatred. cific microplaces where a criminal event will not occur
Conversely, in line with the work of Ferrara et al. (i.e., neutral tweets). There is no doubt these results are
(2016), the most important element for classifying the consistent in environmental terms, and we suggest that
contents of a tweet are the retweets it receives, as they future investigations examine, for example, the role
are closely related to the interaction generated and played by the anonymity variables of accounts in more
the visibility of a message. According to theory, hate detail, or the structural elements of a tweet regarding
speech users seek a greater dissemination of their ideas the dissemination of content.
and might therefore include certain elements such as Although the present study represents an initial stage
URL and hashtags that have been found to make mes- of the investigation, it demonstrates the unquestionable
sages more appealing to retweeting (Suh et al. 2010). capacity of the social sciences to provide important con-
On the other hand, and in the same way that the archi- tributions to the fight against cyberterrorism (Maimon
tectural design of a physical space can condition the and Testa 2017), and, since the main goal is to automate
occurrence of criminal events in certain places [for a the process of classifying messages regardless of plat-
review of Crime Prevention Through Environmental form, it offers relevant information in terms of ways to
Design (CPTED), see Cozens et al. (2005)], the present potentially improve the search algorithms for different
Miró‑Llinares et al. Crime Sci (2018) 7:15 Page 10 of 12

content, as it demonstrates that to detect this type of characteristics associated to hate speech dissemination
communication, we must focus not only on the content in Twitter. This important contribution provides an ana-
of a message but also on the environment in which it is lytical background that opens the way to study different
expressed. In this sense, recent studies applying different forms of cybercrime relying on cyber place metadata.
lexical approaches for classifying tweets such as Support Two relevant cyber places for hate speech dissemina-
Vector Machines (SVM), Logistic Regression, or Ran- tion have been identified in Twitter: accounts and tweets.
dom Forests, have obtained similar or inferior perfor- Drawing on the Random Forests technique, tweet meta-
mances than the algorithm presented in this study, solely data proved to be more efficient in the classification of
fed with metadata. Thus, while our Random Forest tweet hate speech content than account metadata. This sug-
model hits a F1-score of 0.92,4 these previous attempts gests that not all variables should be taken into account
obtained F-measures of 0.77 (Burnap and Williams 2015), when building predictive models, restricting models to
0.90 (Davidson et al. 2017), and 0.76 (Sharma et al. 2018) those variables which are supported by valid theoretical
respectively. schemes for solving particular problems. In this case, and
We further argue that the use of metadata to classify given the nature of hate speech, it is crucial to consider
messages can help to overcome limitations that arise the essential variables for content propagation in social
from the application of approaches such as Bag of Words networks for predictive modelling. And even if this is not
to samples comprising texts in different languages. In a methodology comparison paper, the precision scores
this sense, we believe that a combination of lexical and obtained show that this approach is, at least, on par with
metadata approaches would enhance the ability of state- other methods based on semantic approaches.
of-the art approaches to detect radical communication Although studying the entire population of digital mes-
in social networks. From a methodological point of view, sages on any platform is an unrealistic task, a sample of
it can also be argued that metadata yield benefit both in over 200,000 tweets gives us the ability to answer our
the extraction of variables, since they can be obtained research question, despite our inability to generalise the
through the API, and their simpler computation process current findings to all Twitter events. This further leads
compared to text-based variables. to the fundamental question of whether hate speech has
It should be noted that the contribution of the present been properly measured, that is, whether hate speech
work is cross-cutting, as it goes beyond the frontiers of content has been properly distinguished from what is
Twitter because all social networks host information of not. Regardless of the appropriateness of the taxonomy
major importance in the metadata of their microplaces. used to identify hate speech or whether the judges prop-
However, this raises interesting questions regarding who erly classified the sample, it is certain that the chosen
has access to such metadata and whether the metadata method differentiates between events, which has been
should be made available to any user through open access shown in the aforementioned studies.
systems or its access should be somehow limited. In any As an axiological analysis, the sample may not accu-
case, it seems that the current trend for many social net- rately reflect the prevalence of hate speech on Twitter,
works is restrictive. Indeed, this has been the case for but it is true that any pragmatic analysis will never lead
Facebook and Instagram, from which the extraction of two researchers to draw identical conclusions given the
information is becoming increasingly difficult. Until now, nature of language and the circumstances of commu-
Twitter has continued to function with an open philos- nication. In this sense, this study aimed to achieve the
ophy that allows researchers to collect a wide range of greatest possible accuracy between judges to enable the
data. analysis to interpret each criterion based on an accept-
able level of agreement. Further research should be con-
Conclusion ducted to be able to escalate the application of the idea
Showing that environmental criminology can also be behind the methodology proposed in the present study.
applied to cyberspace settings, this paper has introduced Finally, despite demonstrating the utility of metadata
a brand-new theoretical framework to underpin online in terms of precision for classification purposes, future
hate speech detection algorithms. Crime Pattern Theory research should aim to (1) compare computational times
principles and cyber place conceptualizations based on when using metadata versus text variables to determine
digital spaces of convergence (Miró-Llinares and Johnson which technique is more efficient, (2) test the ability of
2018) have been adapted to identify the most relevant metadata models to overcome language limitations by
comparing their performance in samples of different lan-
guages, and (3) merge the application of metadata and
lexico-syntactical approaches to reduce the number of
4
Similar F1-scores were obtained in different samples that were not false negatives and positives, and to subsequently obtain
included in this paper but used the same methodology.
Miró‑Llinares et al. Crime Sci (2018) 7:15 Page 11 of 12

even higher precisions with hate speech detection algo- Ashcroft, M., Fisher, A., Kaati, L., Omer, E., & Prucha, N. (2015). Detecting jihadist
messages on Twitter. In European Intelligence and Security Informatics
rithms in cyberspace. Conference (EISIC) (pp. 161–164). New York: IEEE.
Awan, I. (2016). Islamophobia on social media: A qualitative analysis of the
Facebook’s Walls of Hate. International Journal of Cyber Criminology, 10(1),
1–20.
Additional files Awan, I., & Blakemore, B. (2016). Policing cyber hate, cyber threats and cyber ter-
rorism. Abingdon: Routledge.
Additional file 1. Appendix A: Pseudocode for obtaining the sample. Berger, J. M., & Morgan, J. (2015). The ISIS twitter census: Defining and describ‑
Appendix B: Pseudocode for pre-processing the variables. ing the population of ISIS supporters on Twitter. The Brookings Project on
US Relations with the Islamic World, 3(20), 1–68.
Additional file 2. Anonymized data set used for the present study. The Bode, L., & Dalrymple, K. E. (2016). Politics in 140 characters or less: Campaign
anonymized variables are in red. communication, network interaction, and political participation on Twit‑
ter. Journal of Political Marketing, 15(4), 311–332.
Brantingham, P. L., & Brantingham, P. J. (1981). Notes on the geometry of crime.
Abbreviations In P. J. Brantingham & P. L. Brantingham (Eds.), Environmental criminology
API: Application Programming Interface; CPTED: Crime Prevention Through (pp. 27–54). Beverly Hills, CA: Sage.
Environmental Design; JSON: JavaScript Object Notation; SVM: Support Vector Breiman, L. (2001). Random Forests. Machine Learning, 45, 5–32.
Machines; URL: Uniform Resource Locator. Brenner, S. W. (2017). Nanocrime 2.0. In M. R. McGuire & T. J. Holt (Eds.), The
Routledge handbook of technology, crime and justice (pp. 611–642). New
Authors’ contributions York City, NY: Routledge.
The theoretical framework and research question were initially stated by FM, Burnap, P., & Williams, M. L. (2015). Cyber hate speech on twitter: An applica‑
while AM further developed this background. Then, ME obtained and preproc‑ tion of machine classification and statistical modeling for policy and
essed the sample required for the analysis. Variables were selected according decision making. Policy & Internet, 7(2), 223–242.
to FM and AM approach and machine learning techniques were conducted Chatzakou, D., Kourtellis, N., Blackburn, J., De Cristofaro, E., Stringhini, G., &
by ME. Finally, FM and AM jointly interpreted the results and elaborated the Vakali, A. (2017). Mean birds: Detecting aggression and bullying on twit‑
discussion section and conclusions. All authors read and approved the final ter. In Proceedings of the Web Science Conference (pp. 13–22). New York:
manuscript. ACM.
Cheong, M., & Lee, V. C. (2011). A microblogging-based approach to terrorism
Author details informatics: Exploration and chronicling civilian sentiment and response
1
CRÍMINA Research Center for the Study and Prevention of Crime, Miguel to terrorism events via Twitter. Information Systems Frontiers, 13(1), 45–59.
Hernández University of Elche, Avda. de la Universidad, s/n, Hélike building, Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational
03201 Elche (Alicante), Spain. 2 Center of Operations Research, Miguel Hernán‑ and Psychological Measurement, 20(1), 37–46.
dez University of Elche, Elche, Spain. Cozens, P. M., Saville, G., & Hillier, D. (2005). Crime prevention through envi‑
ronmental design (CPTED): A review and modern bibliography. Property
Acknowledgements Management, 23(5), 328–356.
We thank Dr. Timothy C. Hart from University of Tampa for his thoughts on Davidson, T., Warmsley, D., Macy, M., & Weber, I. (2017). Automated hate speech
digital microenvironments and his valuable comments that helped to improve detection and the problem of offensive language. arXiv preprint arXiv
the manuscript. :1703.04009.
Décary-Hétu, D., & Morselli, C. (2011). Gang presence in social network sites.
Competing interests International Journal of Cyber Criminology, 5(2), 876–890.
The authors declare that they have no competing interests. Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavljevic, V., & Bhamidipati, N.
(2015). Hate speech detection with comment embeddings. In Proceed-
Availability of data and materials ings of the 24th International Conference on World Wide Web (pp. 29–30).
The dataset analysed in the present study is attached as Additional file 2 in New York: ACM.
format .xlsx. Please note that the original dataset was retrieved in format .json, Dos Santos, C., & Gatti, M. (2014). Deep convolutional neural networks for
but it is way too heavy to be attached (13 GB) and pretty difficult to read. In sentiment analysis of short texts. In Proceedings of the 25th International
addition, some variables have been removed to protect the privacy of Twitter Conference on Computational Linguistics: Technical Papers (pp. 69–78).
users. COLING.
Eck, J. E., & Weisburd, D. (1995). Crime places in crime theory. In J. E. Eck & D.
Funding Weisburd (Eds.), Crime and place (pp. 1–33). Monsey, NY: Criminal Justice
This project has received funding from the European Union’s Horizon 2020 Press.
research and innovation programme under Grant Agreement No. 740773. This Edwards, A. (2017). Big data, predictive machines and security: the minority
research has been funded by the Spanish Ministry of Education, Culture and report. In M. R. McGuire & T. J. Holt (Eds.), The Routledge Handbook of Tech-
Sports under FPU Grant Reference FPU16/01671. nology, Crime and Justice (pp. 451–461). New York, NY: Routledge.
Esteve, M., Miró-Llinares, F., & Rabasa, A. (2018). Classification of tweets with
Publisher’s Note a mixed method based on pragmatic content and meta-information.
Springer Nature remains neutral with regard to jurisdictional claims in pub‑ International Journal of Design & Nature and Ecodynamics, 13(1), 60–70.
lished maps and institutional affiliations. Farías, D. I. H., Patti, V., & Rosso, P. (2016). Irony detection in Twitter: The role of
affective content. ACM Transactions on Internet Technology, 16(3), 19.
Received: 4 May 2018 Accepted: 31 October 2018 Ferrara, E., Wang, W. Q., Varol, O., Flammini, A., & Galstyan, A. (2016). Predict‑
ing online extremism, content adopters, and interaction reciprocity. In
International Conference on Social Informatics (pp. 22–39). Berlin: Springer
International Publishing.
Gerstenfeld, P. B., Grant, D. R., & Chiang, C. P. (2003). Hate online: A content
References analysis of extremist internet sites. Analyses of Social Issues and Public
Agarwal, N., Gupta, R., Singh, S. K., & Saxena, V. (2017). Metadata based multi- Policy, 3(1), 29–44.
labelling of YouTube videos. In 7th International Conference on Cloud Hernández, J., Ramírez, M. J., & Ferri, C. (2004). Introducción a la minería de
Computing, Data Science & Engineering-Confluence (pp. 586–590). New Datos. Madrid: Pearson.
York: IEEE.
Miró‑Llinares et al. Crime Sci (2018) 7:15 Page 12 of 12

Hosseinmardi, H., Mattson, S. A., Rafiq, R. I., Han, R., Lv, Q., & Mishra, S. (2015). Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y., & Chang, Y. (2016). Abusive
Detection of cyberbullying incidents on the Instagram social network. language detection in online user content. In Proceedings of the 25th
arXiv preprint arXiv:1503.03909. International Conference on World Wide Web (pp. 145–153). IW3C2.
Hsia, J. (2017). Twitter trouble: The communications decency act in inaction. Peddinti, S. T., Ross, K. W., & Cappos, J. (2014). On the internet, nobody knows
Columbia Business Law Review, 2017, 399–452. you’re a dog: A Twitter case study of anonymity in social networks. In
Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. New York, NY: Proceedings of the Second Association for Computer Machinery Conference
Springer. on Online Social Networks (COSN) (pp. 83–94). New York: ACM.
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1),
categorical data. Biometrics, 33(1), 159–174. 81–106.
Lara-Cabrera, R., Gonzalez-Pardo, A., Barhamgi, M., & Camacho, D. (2017). Reyes, A., Rosso, P., & Veale, T. (2013). A multidimensional approach for detect‑
Extracting radicalisation behavioural patterns from social network data. In ing irony in twitter. Language Resources and Evaluation, 47(1), 239–268.
28th International Workshop on Database and Expert Systems Applications Schmidt, A., & Wiegand, M. (2017). A survey on hate speech detection using
(DEXA) (pp. 6–10). New York: IEEE. natural language processing. In Proceedings of the Fifth International Work-
Levin, B. (2002). Cyberhate: A legal and historical analysis of extremists’ use shop on Natural Language Processing for Social Media (pp. 1–10). ACL.
of computer networks in America. American Behavioral Scientist, 45(6), Serra, J., Leontiadis, I., Spathis, D., Blackburn, J., Stringhini, G., & Vakali, A. (2017).
958–988. Class-based Prediction Errors to Detect Hate Speech with Out-of-vocab‑
Li, L., Goodchild, M. F., & Xu, B. (2013). Spatial, temporal, and socioeconomic ulary Words. In Proceedings of the First Workshop on Abusive Language
patterns in the use of Twitter and Flickr. Cartography and Geographic Online (ALW1) (pp. 36–40). ACL.
Information Science, 40(2), 61–77. Sharma, S., Agrawal, S., & Shrivastava, M. (2018). Degree based classification of
Magdy, W., Darwish, K., & Abokhodair, N. (2015). Quantifying public response harmful speech using twitter data. arXiv preprint arXiv:1806.04197.
towards Islam on Twitter after Paris attacks. arXiv preprint arXiv:1512.04570. Sherman, L. W., Gartin, P. R., & Buerger, M. E. (1989). Hot spots of predatory
Maimon, D., & Testa, A. (2017). On the Relevance of Cyber Criminological crime: Routine activities and the criminology of place. Criminology, 27(1),
Research in the Design of Policies and Sophisticated Security Solutions 27–56.
against Cyberterrorism Events. In G. LaFree & J. D. Freilich (Eds.), The Sloan, L., Morgan, J., Burnap, P., & Williams, M. (2015). Who tweets? Deriving
Handbook of the Criminology of Terrorism (pp. 553–567). West Sussex, UK: the demographic characteristics of age, occupation and social class from
Wiley & Sons. Twitter user meta-data. PLoS ONE, 10(3), 1–17.
Malmasi, S., & Zampieri, M. (2017). Detecting hate speech in social media. arXiv Suh, B., Hong, L., Pirolli, P., & Chi, E. H. (2010). Want to be retweeted? large scale
preprint arXiv:1712.06427. analytics on factors impacting retweet in twitter network. In 2010 IEEE
Marcum, C. D., Higgins, G. E., Freiburger, T. L., & Ricketts, M. L. (2012). Battle of Second International Conference on Social Computing (pp. 177-184). IEEE.
the sexes: An examination of male and female cyber bullying. Interna- Tsesis, A. (2001). Hate in cyberspace: Regulating hate speech on the Internet.
tional Journal of Cyber Criminology, 6(1), 904–911. San Diego L. Rev., 38, 817.
Mariconti, E., Suarez-Tangil, G., Blackburn, J., De Cristofaro, E., Kourtellis, N., Tuffery, S., (2011). Data mining and statistics for decision making. Wiley Series in
Leontiadis, I., Serrano, J. L., & Stringhini, G. (2018). “You know what to do”: Computational Statistics.
Proactive detection of youtube videos targeted by coordinated hate Veilleux-Lepage, Y. (2014). Retweeting the Caliphate: The role of soft-sympa‑
attacks. arXiv preprint arXiv:1805.08168. thizers in the Islamic State’s social media strategy. In 6th International
Marwick, A. E., & Boyd, D. (2011). I tweet honestly, I tweet passionately: Twitter Terrorism and Transnational Crime Conference.
users, context collapse, and the imagined audience. New Media & Society, Waseem, Z., & Hovy, D. (2016). Hateful symbols or hateful people? Predictive
13(1), 114–133. features for hate speech detection on Twitter. In Proceedings of NAACL-HLT
McGuire, M. R. (2017). Technology crime and technology control: Contexts (pp. 88–93). ACL.
and history. In M. R. McGuire & T. J. Holt (Eds.), The Routledge handbook of Weimann, G. (2014). New terrorism and new media. Washington, DC: Commons
technology, crime and justice (pp. 35–60). New York City, NY: Routledge. Lab of the Woodrow Wilson International Center for Scholars.
Miró-Llinares, F. (2016). Taxonomía de la comunicación violenta y el discurso Williams, M. L., & Burnap, P. (2015). Cyberhate on social media in the aftermath
del odio en Internet. IDP. Revista de Internet, Derecho y Política, 22, 82–107. of Woolwich: A case study in computational criminology and big data.
Miró-Llinares, F., & Johnson, S. D. (2018). Cybercrime and place: Applying British Journal of Criminology, 56(2), 211–238.
environmental criminology to crimes in cyberspace. In G. J. N. Bruinsma
& S. D. Johnson (Eds.), The oxford handbook of environmental criminology
(pp. 883–906). Oxford: Oxford University Press.
Miró-Llinares, F., & Rodríguez-Sala, J. J. (2016). Cyber hate speech on Twitter:
Analyzing disruptive events from social media to build a violent com‑
munication and hate speech taxonomy. International Journal of Design &
Nature and Ecodynamics, 11(3), 406–415.

Sahana Udupa, Iginio Gagliardone, Peter Hervik - Digital Hate - The Global Conjuncture of Extreme Speech-Indiana University Press (2021)
No ratings yet
Sahana Udupa, Iginio Gagliardone, Peter Hervik - Digital Hate - The Global Conjuncture of Extreme Speech-Indiana University Press (2021)
277 pages
Dhis2 User Manual
No ratings yet
Dhis2 User Manual
787 pages
Archives in Liquid Times PDF
100% (2)
Archives in Liquid Times PDF
332 pages
How To Enhance The Shipping Cockpit
No ratings yet
How To Enhance The Shipping Cockpit
41 pages
Next Move 3 Workbook
100% (1)
Next Move 3 Workbook
140 pages
Schedule C-3-2 Cloud Security Risk Assessment Questionnaire
No ratings yet
Schedule C-3-2 Cloud Security Risk Assessment Questionnaire
22 pages
Course Book Hydroinformatics: Data Management and Analysis
No ratings yet
Course Book Hydroinformatics: Data Management and Analysis
92 pages
Digital Asset Management Comparison Guide: Public
No ratings yet
Digital Asset Management Comparison Guide: Public
22 pages
TrainingGuide Geomatica OrthoEngine
No ratings yet
TrainingGuide Geomatica OrthoEngine
174 pages
KMS Admin Manual
No ratings yet
KMS Admin Manual
37 pages
A - H - Maslow-Motivatie Si Personalitate
No ratings yet
A - H - Maslow-Motivatie Si Personalitate
518 pages
Dhis2 End User Manual
No ratings yet
Dhis2 End User Manual
72 pages
Hadith Science Research Proposal
100% (1)
Hadith Science Research Proposal
4 pages
Trends in Data Warehousing and Business Intelligence
No ratings yet
Trends in Data Warehousing and Business Intelligence
44 pages
Dama Exam Prep
No ratings yet
Dama Exam Prep
7 pages
Ma Densen 2012
No ratings yet
Ma Densen 2012
30 pages
Open Bor Logc
No ratings yet
Open Bor Logc
39 pages
Ma Densen 2008
No ratings yet
Ma Densen 2008
15 pages
SSRN 2629856
No ratings yet
SSRN 2629856
33 pages
Sherman 1989
No ratings yet
Sherman 1989
29 pages
Hlg2023 Dafi Final - 0
No ratings yet
Hlg2023 Dafi Final - 0
60 pages
100% Sure Bets? Exploring The Precipitation-Control Strategies of Fixed-Match Informing Websites and The Environmental Features of Their Networks
No ratings yet
100% Sure Bets? Exploring The Precipitation-Control Strategies of Fixed-Match Informing Websites and The Environmental Features of Their Networks
19 pages
Chamelot Et Al 2020 - Archives The Digitial Turn and Governance in Africa
No ratings yet
Chamelot Et Al 2020 - Archives The Digitial Turn and Governance in Africa
18 pages
Eck 2015
No ratings yet
Eck 2015
11 pages
File Bijoy Keyboard Image - JPG
No ratings yet
File Bijoy Keyboard Image - JPG
2 pages
Installing and Configuring M-Files Network Folder Connector
No ratings yet
Installing and Configuring M-Files Network Folder Connector
10 pages
Hate Speech On Social Media
No ratings yet
Hate Speech On Social Media
11 pages
DMDW 7
No ratings yet
DMDW 7
30 pages
Sparx Design Patterns
No ratings yet
Sparx Design Patterns
32 pages
Moving Data Between Different Database Versions: See Also
No ratings yet
Moving Data Between Different Database Versions: See Also
2 pages
It For Managers: An Assignment On What Are Data Warehousing, Meta Data, Data Mining, and The Uses
0% (1)
It For Managers: An Assignment On What Are Data Warehousing, Meta Data, Data Mining, and The Uses
13 pages
Document
No ratings yet
Document
214 pages
Salesforce Deployment Tools
No ratings yet
Salesforce Deployment Tools
4 pages
1) How To Rerun A Pipe Line From Data Factory Monitor.: Azure Data Factory Advanced Interview Questions and Answers
No ratings yet
1) How To Rerun A Pipe Line From Data Factory Monitor.: Azure Data Factory Advanced Interview Questions and Answers
18 pages
Framework Manager Developer Guide
No ratings yet
Framework Manager Developer Guide
152 pages
Open Hub: Know How Network May 1, 2003 Bryan Katis
No ratings yet
Open Hub: Know How Network May 1, 2003 Bryan Katis
38 pages
12-Lab - Information Gathering Using Metagoofil
No ratings yet
12-Lab - Information Gathering Using Metagoofil
4 pages
Access Database Case Study Instructions
No ratings yet
Access Database Case Study Instructions
8 pages
Trans Technologies
From Everand
Trans Technologies
Oliver L. Haimson
No ratings yet
Multilingual Hate Speech Detection A Semi-Supervised Generative Adversarial Approach
No ratings yet
Multilingual Hate Speech Detection A Semi-Supervised Generative Adversarial Approach
19 pages
Hate Speech Chapter Final Preprint
No ratings yet
Hate Speech Chapter Final Preprint
27 pages
Resistance to the Current: The Dialectics of Hacking
From Everand
Resistance to the Current: The Dialectics of Hacking
Johan Soderberg
No ratings yet
Mapping The Scientific Knowledge and Approaches To Defining and Measuring Hate Crime, Hate Speech, and Hate Incidents: A Systematic Review
No ratings yet
Mapping The Scientific Knowledge and Approaches To Defining and Measuring Hate Crime, Hate Speech, and Hate Incidents: A Systematic Review
54 pages
Plagiarism Report
No ratings yet
Plagiarism Report
74 pages
A Survey On Automatic Online Hate Speech Detection in Low-Resource Languages
No ratings yet
A Survey On Automatic Online Hate Speech Detection in Low-Resource Languages
34 pages
Overview of The HASOC Subtrack at FIRE 2023: Identification of Conversational Hate-Speech
No ratings yet
Overview of The HASOC Subtrack at FIRE 2023: Identification of Conversational Hate-Speech
9 pages
1 s2.0 S1071581925000254 Main
No ratings yet
1 s2.0 S1071581925000254 Main
15 pages
Was Antisemitism On The Rise During The Tumultuous Weeks of Elon Musk's Twitter Takeover?
No ratings yet
Was Antisemitism On The Rise During The Tumultuous Weeks of Elon Musk's Twitter Takeover?
29 pages
Cultural Analytics
From Everand
Cultural Analytics
Lev Manovich
No ratings yet
Azz 049
No ratings yet
Azz 049
25 pages
Target-Oriented Investigation of Online Abusive Attacks: A Dataset and Analysis
No ratings yet
Target-Oriented Investigation of Online Abusive Attacks: A Dataset and Analysis
14 pages
Critical Code Studies
From Everand
Critical Code Studies
Mark C. Marino
No ratings yet
1840 3948 1 SM
No ratings yet
1840 3948 1 SM
10 pages
Researching Internet Governance: Methods, Frameworks, Futures
From Everand
Researching Internet Governance: Methods, Frameworks, Futures
Laura Denardis
No ratings yet
Decoding the Social World: Data Science and the Unintended Consequences of Communication
From Everand
Decoding the Social World: Data Science and the Unintended Consequences of Communication
Sandra Gonzalez-Bailon
No ratings yet
Gitari - A Lexicon-Based Approach For Hate Speech Detection
0% (1)
Gitari - A Lexicon-Based Approach For Hate Speech Detection
16 pages
A Big-Data Processing and Visualization Platform
No ratings yet
A Big-Data Processing and Visualization Platform
21 pages
1 s2.0 S1877050924030230 Main
No ratings yet
1 s2.0 S1877050924030230 Main
12 pages
Ssoar 2023 Sponholz Hate Speech
No ratings yet
Ssoar 2023 Sponholz Hate Speech
23 pages
B.E Cse Batchno 168
No ratings yet
B.E Cse Batchno 168
42 pages
Journal Pone 0305657
No ratings yet
Journal Pone 0305657
24 pages
Beyond Data: Reclaiming Human Rights at the Dawn of the Metaverse
From Everand
Beyond Data: Reclaiming Human Rights at the Dawn of the Metaverse
Elizabeth M. Renieris
No ratings yet
A Lexicon-Based Approach For Hate Speech Detection
No ratings yet
A Lexicon-Based Approach For Hate Speech Detection
17 pages
Cyber Hate in Electoral Campaigns
No ratings yet
Cyber Hate in Electoral Campaigns
8 pages
Seminar Research Format
No ratings yet
Seminar Research Format
14 pages
Final Report Edit
No ratings yet
Final Report Edit
26 pages
Hate Speech Detection - Challenges and Solutions - PLOS ONE
No ratings yet
Hate Speech Detection - Challenges and Solutions - PLOS ONE
9 pages
3 Detection of Hate Speech in Social Networks A Surv
No ratings yet
3 Detection of Hate Speech in Social Networks A Surv
19 pages
Reprogrammable Rhetoric: Critical Making Theories and Methods in Rhetoric and Composition
From Everand
Reprogrammable Rhetoric: Critical Making Theories and Methods in Rhetoric and Composition
Michael J. Faris
No ratings yet
Hate Crimes in Social Media: A Criminological Review: Article History
No ratings yet
Hate Crimes in Social Media: A Criminological Review: Article History
6 pages
Combating Online Hate A Comparative Study On Identification of Hate Speech and Offensive Content in Social Media Text
No ratings yet
Combating Online Hate A Comparative Study On Identification of Hate Speech and Offensive Content in Social Media Text
6 pages
FDIA 2023 Paper 4
No ratings yet
FDIA 2023 Paper 4
12 pages
The Internet and Social Inequality at the Turn of the 20th Century
From Everand
The Internet and Social Inequality at the Turn of the 20th Century
Peter Fernsby
No ratings yet
RRL Individual
No ratings yet
RRL Individual
3 pages
Awan Sutch Carter Extremism Online
No ratings yet
Awan Sutch Carter Extremism Online
29 pages
RRR
No ratings yet
RRR
35 pages
Social Media Based Hate Speech Detection Using Machine Learning
No ratings yet
Social Media Based Hate Speech Detection Using Machine Learning
10 pages
The Complex Landscape of Online Hate Speech Regulation
No ratings yet
The Complex Landscape of Online Hate Speech Regulation
13 pages
Hate Speech Detection Using Machine Learning2
No ratings yet
Hate Speech Detection Using Machine Learning2
4 pages
Navigating The Dark Web of Hate: Supervised Machine Learning Paradigm and NLP For Detecting Online Hate Speeches
No ratings yet
Navigating The Dark Web of Hate: Supervised Machine Learning Paradigm and NLP For Detecting Online Hate Speeches
8 pages
RP 4
No ratings yet
RP 4
7 pages
Hate Speech, Offensive Language Detection and Blocking On Social Media Platform Using Feature Engineering Techniques and Machine Learning Algorithms A Comparative Study
No ratings yet
Hate Speech, Offensive Language Detection and Blocking On Social Media Platform Using Feature Engineering Techniques and Machine Learning Algorithms A Comparative Study
16 pages
(ACCEPTED) Detection-Of-Hateful-Twitter-Users-With-Graph-Convolutional-Network-Model
No ratings yet
(ACCEPTED) Detection-Of-Hateful-Twitter-Users-With-Graph-Convolutional-Network-Model
15 pages
Philosophy of Blockchain Technology: Ontologies
From Everand
Philosophy of Blockchain Technology: Ontologies
Nicolae Sfetcu
5/5 (1)
Burnap Et Al (2015)
No ratings yet
Burnap Et Al (2015)
20 pages
Hate Speech On Social Media Networks: Towards A Regulatory Framework?
No ratings yet
Hate Speech On Social Media Networks: Towards A Regulatory Framework?
18 pages
Detecting and Visualizing Hate Speech in Social Media: A Cyber Watchdog For Surveillance-Modha2020
No ratings yet
Detecting and Visualizing Hate Speech in Social Media: A Cyber Watchdog For Surveillance-Modha2020
11 pages
RP 5
No ratings yet
RP 5
7 pages
A Survey On Automatic Detection of Hate Speech in Text
No ratings yet
A Survey On Automatic Detection of Hate Speech in Text
30 pages
Hate Speech Detection: Challenges and Solutions: A1111111111 A1111111111 A1111111111 A1111111111 A1111111111
No ratings yet
Hate Speech Detection: Challenges and Solutions: A1111111111 A1111111111 A1111111111 A1111111111 A1111111111
16 pages
Pohjonen 2019 Social Media Hate Speech
No ratings yet
Pohjonen 2019 Social Media Hate Speech
16 pages
Hate Speech Detection: A Solved Problem? The Challenging Case of Long Tail On Twitter
No ratings yet
Hate Speech Detection: A Solved Problem? The Challenging Case of Long Tail On Twitter
21 pages
TW Identifying Cyber Hate On PDF
No ratings yet
TW Identifying Cyber Hate On PDF
15 pages
Automatic Hate Speech Detection On Social Media: A Brief Survey
No ratings yet
Automatic Hate Speech Detection On Social Media: A Brief Survey
6 pages
Get Smart Fast: An analysis of Internet based collaborative knowledge environments for critical digital media autonomy
From Everand
Get Smart Fast: An analysis of Internet based collaborative knowledge environments for critical digital media autonomy
Joe Tojek
No ratings yet
Icwsm2016 Hate PDF
No ratings yet
Icwsm2016 Hate PDF
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Hate Is in The Air! But Where? Introducing An Algorithm To Detect Hate Speech in Digital Microenvironments

Uploaded by

Hate Is in The Air! But Where? Introducing An Algorithm To Detect Hate Speech in Digital Microenvironments

Uploaded by

Miró‑Llinares et al.

Crime Sci (2018) 7:15

RESEARCH Open Access

Hate is in the air! But where? Introducing

Introduction of the spatiotemporal barriers that limit the transmission

Independent variables: microplace characteristics dissemination. Table 1 provides a detailed description

we used к-fold cross validation defining к = 5 subsamples

Fig. 1 Overview of the methodology employed

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.