Earthquake Shakes Twitter User:: Analyzing Tweets For Real-Time Event Detection
Earthquake Shakes Twitter User:: Analyzing Tweets For Real-Time Event Detection
Earthquake Shakes Twitter User:: Analyzing Tweets For Real-Time Event Detection
Outline
Introduction Event Detection Model Experiments And Evaluation Application
Conclusions
Outline
Introduction
Whats happening?
is one of the most popular microblogging services has received much attention recently is a form of blogging
Microblogging
that allows users to send brief text updates that allows users to send photographs or audio clips
is a form of micromedia
real-time nature
disastrous events storms fires traffic jams riots heavy rain-falls earthquakes Twitter users write tweets several times in a single day.
There is a large number of tweets, which results in many reports related to events We can know how other users are doing in real-time We can know what happens around other users in realtime.
Our motivation
Adam Ostrow, an Editor in Chief at Mashable wrote the possibility to detect earthquakes from tweets in his blog
Japan Earthquake Shakes Twitter Users ... And Beyonce: Earthquakes are one thing you can bet on being covered on Twitter first, because, quite frankly, if the ground is shaking, youre going to tweet about it before it even registers with the USGS* and long before it gets reported by the media. That seems to be the case again today, as the third earthquake in a week has hit Japan and its surrounding islands, about an hour ago. The first user we can find that tweeted about it was Ricardo Duran of Scottsdale, AZ, who, judging from his Twitter feed, has been traveling the world, arriving in Japan yesterday.
we can know earthquake occurrences from tweets =the motivation of our research
Our Goals
a map of earthquake occurrences world wide The intersection is regions with many earthquakes and large twitter users.
Other regions: Indonesia, Turkey, Iran, Italy, and Pacific coastal US cities
Outline
Event Detection
shaking, earthquake
Example:
Earthquake right now!! ---positive Someone is shaking hands with my boss --- negative
Create a classifier
Statistical features (7 words, the 5th word) the number of words in a tweet message and the position of the query within a tweet Keyword features ( I, am, in, Japan, earthquake, right, now) the words in a tweet Word context features (Japan, right) the words before and after the query word
observation by sensors
target object
observation by sensors
target object
a tweet a sensor reading a sensor detects a target event and makes a report probabilistically Example:
make a tweet about an earthquake occurrence earthquake sensor return a positive value
a time : post time location : GPS data or location information in users profile
Processing time information and location information, we can detect target events and estimate location of target events
Outline
Model
Probabilistic Model
Sensor values are noisy and sometimes sensors work incorrectly We cannot judge whether a target event occurred or not from one tweets We have to calculate the probability of an event occurrence from a series of data
event detection from time-series data location estimation from a series of spatial information
Temporal Model
We must calculate the probability of an event occurrence from multiple sensor values We examine the actual time-series data to create a temporal model
20
60
80
Temporal Model
Aug 9 Aug 9 0 Aug 9 0 Aug 10 0 Aug 10 0 Aug 10 0 0 Aug 11 0 Aug 11 0 Aug 11 0 Aug 12 0 Aug 12 0 Aug 12 0 Aug 13 0 Aug 13 0 Aug 13 0 Aug 14 0 Aug 14 0 Aug 14 0 Aug 15 0 Aug 15 0 Aug 15 0 Aug 16 0 Aug 16 0 Aug 16 0 Aug 17 0 Aug 17 0
0
120
Temporal Model
f t; e
t 0, 0
0.34
design the alarm of the target event probabilistically ,which was based on an exponential distribution
Spatial Model
We must calculate the probability distribution of location of a target We apply Bayes filters to this problem which are often used in location estimation by sensors
Kalman Filters
are the most widely used variant of Bayes filters approximate the probability distribution which is virtually identical to a uni-modal Gaussian representation advantages: the computational efficiency disadvantages: being limited to accurate sensors or sensors with high update rates
samples, or particles
advantages: probability
densities
particle filters can converge to the true posterior even in nonGaussian, nonlinear dynamic systems.
if an information diffusion happened among users, Twitter user sensors are not independent . They affect each other
In the case of an earthquakes and a typhoons, very little information diffusion takes place on Twitter, compared to Nintendo DS Game We assume that Twitter user sensors are independent about earthquakes and typhoons
Outline
We demonstrate performances of
tweet classification event detection from time-series data show this results in application location estimation from a series of spatial information
Queries
earthquake query
Features Recall Precision F-Value
Statistical
Context All
87.50%
50.00% 87.50%
63.64%
38.89% 66.67% 63.64%
73.69%
53.85% 57.14% 73.69%
Keywords 87.50%
shaking query
Features Statistical Context All Recall 66.67% 52.78% 80.56% Precision F-Value 68.57% 57.41% 86.36% 65.91% 67.61% 68.89% 68.20% 72.50%
Keywords 86.11%
Statistical
Keywords Context
87.50%
87.50% 50.00%
63.64%
38.89% 66.67%
73.69%
53.85% 57.14%
All
87.50%
63.64%
73.69%
We obtain highest F-value when we use Statistical features and all features. Keyword features and Word Context features dont contribute much to the classification performance A user becomes surprised and might produce a very short tweet Its apparent that the precision is not so high as the recall
We demonstrate performances of
tweet classification event detection from time-series data show this results in application location estimation from a series of spatial information
Target events
earthquakes
typhoons
Baseline methods
weighed average
simply takes the average of latitudes and longitudes simply takes the median of latitudes and longitudes
the median
Kyoto Tokyo
Osaka
Average
5.47
3.62
3.85
3.01
Average
4.39
4.02
9.56
3.58
Discussions of Experiments
Particle filters performs better than other methods If the center of a target event is in an oceanic area, its more difficult to locate it precisely from tweets It becomes more difficult to make good estimation in less populated areas
Outline
Application
Toretter ( http://toretter.com)
Earthquake reporting system using the event detection algorithm All users can see the detection of past earthquakes Registered users can receive e-mails of Dear Alice, earthquake detection reports
We have just detected an earthquake around Chiba. Please take care. Toretter Alert System
Screenshot of Toretter.com
they are received by a user shortly before the earthquake actually arrives.
Is it possible to receive the e-mail before the earthquake actually arrives? An earthquake is transmitted through the earth's
crust at about 3~7 km/s. a person has about 20~30 sec before its arrival at a point that is 100 km distant from an actual center
Aug. 18 Aug. 18 Aug. 21 Aug. 25 Aug.25 Aug. 27 Aug. 27 Ag. 31 Sep. 2 Sep. 2
4.5 3.1 4.1 4.3 3.5 3.9 2.8 4.5 3.3 3.6
Tochigi Suruga-wan Chiba Uraga-oki Fukushima Wakayama Suruga-wan Fukushima Suruga-wan Bungo-suido
6:58:55 19:22:48 8:51:16 2:22:49 2:21:15 17:47:30 20:26:23 00:45:54 13:04:45 17:37:53
In all cases, we sent E-mails before announces of JMA In the earliest cases, we can sent E-mails in 19 sec.
We demonstrate performances of
tweet classification event detection from time-series data show this results in application location estimation from a series of spatial information
Promptly detected*
53(67.9%)
20(80.0%)
3(100.0%)
Promptly detected: detected in a minutes JMA intensity scale: the original scale of earthquakes by Japan Meteorology Agency
Period: Aug.2009 Sep. 2009 Tweets analyzed : 49,314 tweets Positive tweets : 6291 tweets by 4218 users We detected 96% of earthquakes that were stronger than scale 3 or more during the period.
Outline
Conclusions
Conclusions
Semantic analyses were applied to tweets classification We consider each Twitter user as a sensor and set a problem to detect an event based on sensory observations Location estimation methods such as Kaman filters and particle filters are used to estimate locations of events
We developed an earthquake reporting system, which is a novel approach to notify people promptly of an earthquake event We plan to expand our system to detect events of various kinds such as rainbows, traffic jam etc.
http://toretter.com
Takeshi Sakaki(@tksakaki)
Temporal Model
(t ) 1 p
parameter