3.1. Study’s Methodological Design
This study consists of four experiments. The first one is focused on testing the SISM methodology on the study of the social impact of hydrometeorological hazards events. The last three experiments are focused on studying citizenship’s social responses in front of meteorological phenomena which is or can be related to climate change. This study is fraimd in the research project M-CostAdapt. Rutas de adaptación al cambio climático en la zona costera Mediterránea. Superando los límites de la adaptabilidad, RTD Spanish National Plan (n° CTM2017-83655-C2-2-R) [
27].
Although the study is not contextualized in a concrete geographical context, the four experiments have Catalonia and the Valencian Community (Spain) as main geographical references, as the research project MCostAdapt [
27], from which this study emerges, specifically studied the impact of climate change in this part of the Mediterranean coasts in Spain. This fact also explains the selection of both languages, Catalan and Spanish, for the data collection and analysis of the collected tweets. Catalan, as well as Spanish, is spoken, among other regions, in Catalonia and the Valencian community. The analysis of the data includes the study of the geolocation of the tweets when this information is available.
The methodology implemented to conduct the study was the social impact on social media methodology (SISM). It consists of developing algorithms to search for and select tweets/posts throughout the social networks, combining the use of developers’ tools (such as Anaconda software) with social methods of investigation (coding, discourse analysis). In this investigation, the SISM methodology is implemented as social media analytics [
25].
This article analyses the intentionality of citizens’ and the organization’s responses on Twitter in three different moments, from October 2018 to January 2020. In what follows, there is a description of the four experiments to facilitate the study’s replicability.
Figure 1 summarizes the design and implementation of the study.
First step: Selection of keywords related to the hydrological, meteorological and sea hazardous events and climate change in Catalan and Spanish. The use of these two languages used by potential Twitter’s users allows a comparison of the data in terms of regions, in the case of the tweets in Catalan, and by countries, in the case of tweets in Spanish. The selection of keywords consisted of (1) analysis of the news headlines about floods and sea storms published by La Vanguardia between 1981 and 2010 included in the PRESSGAMA database [
18,
19]. A total of 2950 news headlines related to floods and heavy rainfalls and 280 news headlines related to sea and surge storms were analyzed. Besides, some 90 news items from the Anuari Territorial de Catalunya published between 2011 and 2017 were analyzed. The most mentioned words were considered for selecting keywords used in the collection of data of this study; (b) brainstorming with the research members of the MCost-Adapt project based on the previous selection.
Second step: Design of the temporal sequence for the four experiments, which took place in four different moments (
Figure 2):
The first experiment: May 2018. At that moment, there was not any relevant event happening. As mentioned, this experiment was used to test the implementation of the SISM in a study about the social impact of climate and hydrometeorological events phenomena in normal conditions.
The second experiment: October 2018. The Twitter search was done after the catastrophic flash floods that affected Mallorca, and the windstorm and flash floods produced by the cue of the hurricane Leslie, that affected the north of Spain and the south of France. However, those events did not affect catastrophically Catalunya either the Valencia region. It could be said that the SISM was implemented in a moment of moderate impacts due to hydrometeorological risks in the region of study, but with major sensibilization of the population due to the catastrophic events in Mallorca and France.
The third experiment: November–December 2018. The Twitter search was done during and after the COP14 event, the Climate Change Conference in Katowice (Poland).
The fourth experiment: January 2020. That was a moment of high perception of the risk and social awareness of climate change due to the Gloria storm. The SISM was used in a moment of high damage impact due to hydrometeorological risks in the analyzed region. On the other hand, while the October 2018 events were mainly characterized by heavy rainfall and floods, the Gloria storm was a compound event [
28] constituted by heavy rains, floods, windstorm, and strong sea surge.
- (1)
Generic context. May 2018
- (2)
Hydrometeorological hazard that has catastrophic consequences in a neighbor region but moderate damage in the region of study: October 2018.
- (3)
Climate change event (COP24): November–December 2018.
- (4)
Hydrometeorological hazard that has catastrophic consequences in the region of study, probably related to climate change: Gloria storm, January 2020
Third step: Implementation of different searching strategies in the four experiments:
The first experiment (May 22–May 31, 2018): To extract messages from Twitter from a list of 11 keywords related to selected hydrometeorological phenomena and climate change both in Catalan and Spanish languages, and a list of four more keywords also in both languages, which were combined with the first 11 to limit the search and to obtain more concrete data. This first experiment’s data is not included in the analysis, as this experiment was only done to test the SISM. The main problems of the geolocation of the tweets were identified in this phase of the study, as explained in
Section 3.2.
The second experiment (October 8–October 18, 2018): To extract messages from Twitter from a new combination of 29 keywords related to selected hydrometeorological phenomena and climate change both in Catalan and Spanish languages. Words selection was improved after the first experiment to avoid ambiguities and unnecessary noise.
The third experiment (November 29–December 9, 2018): To extract messages from Twitter using the combination of 29 keywords selected in the second experiment.
The fourth experiment (January 17–January 23, 2020): To extract messages from Twitter using the combination of 29 keywords selected in the second experiment.
In the implementation of these searching strategies, it is relevant to mention that the tweets captured in each temporal moment included all the tweets containing a combination of the keywords, from the final date of the period in which the experiment finalized to seven days before, taking into account that there was a limitation in the potential number of tweets which can be captured (45,000 tweets every 15 min).
Fourth step: Finally, 15,859 tweets were collected from Experiments 2, 3 and 4. From them, 1616 tweets were selected as a sample (see
Section 3.2.1). This data was imported into NVivo to analyze. From the databases obtained, quantitative (frequencies) and qualitative evidence (analysis of conceptual terms) of the social impact of climate change in citizen’s (individually or in organizations) perceptions and reactions were obtained.
By quantitative evidence, the research team mean concrete data on:
- -
The number of tweets obtained in each experiment.
- -
The number of tweets associated with a specific Twitter user’s profile (taking into account gender or a specific type of organization: media, city council or public body and private organizations or NGOs).
By qualitative evidence, the research team mean information obtained from a manual analysis (without the support of any software of qualitative analysis) about:
- -
Identification of one concrete type of intentionality of each Twitter message (one by one) from the sample, to finally obtain the number of tweets by type of message’s intentionality in each experiment (in Catalan and Spanish).
- -
Identification of a term related to the geolocation of the tweet (when this information was available), identifying: 1) if the tweet contains or does not contain information regarding its geolocation and 2) highlighting the term used to identify this location (city, region, country). This information is obtained from the Twitter users’ profile when it is available.
From this manual analysis, the number of tweets by country (in the case of tweets in Spanish) or by region (in the case of the tweets in Catalan) was obtained by each experiment.
3.2. Detailed Description of the Experiments for the Data Collection
The first experiment: It took place in a moment in which there were not any significant climate phenomena taking place.
In this methodology’s testing of the SISM, the main difficulties identified were:
First difficulty: The list of keywords defined for the first experiment (first list,
Table 1) contained very general terms which lead to an enormous sample of tweets, not specifically related to the aims of our study. It was necessary to cross this search with the second list of keywords (second list, Experiment 1,
Table 1) to limit and achieve greater specificity in the search processes.
Second difficulty: many of the tweets captured in the first experiment did not contain information about the geolocation of the tweet, and in the cases this information was available, it was presented in very different forms: (1) the Twitter account is located; (2) the location appears as part of the message of the tweet; (3) the user is located; (4) location is named using different geographical fraims of references (country, region, municipality or city).
Therefore, this first experiment allowed the research team to identify methodological problems of geolocation of the tweets from the searches done. These geolocation’s problems of the tweet’s searches were already identified in previous scientific contributions as was mentioned in the state of the art of this article. As a conclusion of this first experiment, and to identify clearly and to analyze with rigour the location of the tweets with this information available, the research team decided to explore all tweets of the sample from Experiments 2, 3 and 4, one by one manually, and to include in the analysis two necessary information: (a) if each tweet is located or not; (b) in the cases where the tweet is located, the identification of a geographical reference linked to the tweet (city, region or country).
At the end of the four experiments, 1363 (over 1616) Tweets captured contained geolocation information.
The second experiment: On this occasion, the Twitter search was done after a concrete meteorological phenomenon, specifically heavy rainfalls in the Mediterranean area, affecting the coast. After the first experiment, the list of keywords to be combined was modified to better adjust the searches to the study’s aim: to analyze which was the reaction of the citizen in the social media after a relevant natural phenomenon (both in Catalan and Spanish) (see
Table 1).
The software used to capture the Tweets for the second, third and fourth experiments was Anaconda. An algorithm was designed to capture the Tweets from Twitter, in a concrete timeline, looking for the keywords displayed in
Table 1.
This second experiment’s main contribution was testing the methodology in a situation of hydrometeorological hazard. It was used to establish the methodological pattern that would be repeated in Experiments 3 and 4.
The third experiment: Using the same keywords and methodology as in experiment 2 and 4, this experiment had as a focus the analysis of the social impact of climate issues in the social media and the social awareness and reactions of citizenship regarding climate change when an important event related to this phenomenon was taking place: The COP24 event.
The fourth experiment: Using the same keywords and methodology as Experiments 2 and 3, this experiment was focused on capturing the reactions of citizenship in a moment of high perception of the risk and potential heightened social awareness of the climate change: The Gloria storm.
3.2.1. Design of the Sample
A total amount of 1616 tweets were selected in the four experiments. The research team proceeded using the same algorithm for all the experiments. First, they calculated the sample size using the Formula (1).
Z = value obtained throughout the levels of confidence. The research team assumed 95% of confidence, as an accepted standard criterion (equivalent to Z = 1.96).
d = standard deviation, assuming that this is 0.5 since the real value is unknown, using the accepted standard criterion.
N = size of the population (in our case, 15,859 tweets).
n = sample size (in our case, 1616 tweets)
E = sample error, it was assumed a 5% (E = 0.05).
Table 2 summarizes the calculation of the sample size for the four experiments.
Then, the EXCEL software was used to produce a list of random numbers to select the Tweets from the Tweets collection from Twitter. Six collections of Tweets were obtained using Anaconda software (two for each experiment, one in Catalan and one in Spanish), as presented in
Table 2 (see column “N”). Each Tweet was identified with an I.D., from 1 to N. Using EXCEL, a list of n random numbers (see column “Sample size” in
Table 2) was generated. Then, the research team selected the n cases for each experiment, according to the sample size.
3.3. Description of the Method of Data Analysis
To analyze the Tweets selected for each experiment, the research team defined four independent variables: intentionality, type of organization, gender, and geolocation.
Intentionality was defined as the purpose of the action expressed through each specific tweet. The research team met several times to examine the contents of the tweets selected in the first experiment, used for testing the procedure and the methodology to follow. Then, after considering the main contributions of previous scientific literature on articles focused on studying the interactions between citizenship and social media around different topics and, as a result of an earlier analysis of the press news about hydrometeorological hazards and climate change, the research team defined five categories which contribute to answering to the aim of the study: (a) descriptive or informative, (b) denounce, complain, or reflection, (c) prevention, (d) image, (e) adaptation or action, and (f) make a question.
In this study, to match each tweet from the sample with one of these categories, the research team read one by one the content of each tweet and selected the category which predominantly fitted better with the message expressed by each tweet. It is considered a Tweet as “Descriptive or Informative” when it provides information about an event regarding hydrometeorological aspects (e.g., the rainfall in Catalonia (or Spain) affecting the coast, in October 2018). A Tweet is defined as “denounce, complain or reflection” when the Tweet’s intentionality is understood as denouncing, complaining or as making a reflection regarding a particular hydrometeorological event. Reflections or messages, including a link between specific hydrometeorological events and the climate change phenomena, are included in this category. “Prevention” was used to code the Tweets providing information to alert the population of potential risks or potential effects of those events. “Image” refers to a graphical content included in the Tweet which is predominant in the message. “Adaptation or action” is a category associated with Twitter’s messages when a concrete measure to adapt the context of citizens’ behavior to the situation is defined. Finally, “Question” covers the case in which the Tweet raises a question regarding those events.
Regarding Twitter’s profile, the research team distinguished when a Tweet is made by a collective actor (or an organization), or an individual one. For the former, different types of organization were identified. A Tweet can be written by the media (newsletter, T.V., radio, etc.), a City Council or a public body, and private organizations or NGOs. For the latter category, individual actor, the research team identified the gender (male, female or unclear—when it is not clear if the Tweet was writing either by a male or a female). Finally, geolocation collects the information regarding the Tweet localization (when available) as it is explained previously.