Geo Twitter Data Collection and Visualization System Hideyuki Fujita - PowerPoint PPT Presentation

Geo Twitter Data Collection and Visualization System Hideyuki Fujita Graduate School of Information Systems, University of Electro-Communications (Tokyo, Japan)

Backgrounds Mobile social media • generating valuable data for analyzing human behavior and events in the real world • (mobile use of) Facebook, Twitter, Instagram, Flickr, Foursquare, etc. Twitter 500 million users 400 million tweets per day 0.77% geotagged (with the location coordinates) 64% posted from mobile devices Report in July 2012 by Semiocast, inc • becoming mobile media • sharing realtime information including information related to current location

Geo-Twitter Application: Related works • Interactive map application for situational awareness MacEachren et al., 2011 • Realtime mapping of local news Sankaranarayanan et al.,2009 • Realtime event detection and location / trajectory prediction of earthquakes and typhoons Sakaki et al.,2010 Key technologies • Event extraction from text •Natural language processing, machine learning • Spatial analysis • Location based data collection

Twitter Data collection: Problem and Objective Twitter API (Application Programing Interface) • Twitter's official service for providing sampling data through HTTP communication. • easy to get small amount of data Problem in collecting large amount of data • The amount of sampling data is small in straightforward use of Twitter API. • Continuous collection of data costs much effort. • Having many researchers collecting the same data is not efficient. Objective • Efficient data collection system for geo-tweet data • Data visualization system for geo-tweet data Future plan • Data sharing system for researchers using geo-tweet data

Data collection method Limitation of Twitter Search API •returns maximum 1,500 tweets under one search filter with location and date-period Method •divide area into small areas (grid) •divide date-period into tweetID-periods tweet ID: integer ID attached to all tweets in ascending sequence period area • collect data within each divided area and period • aggregate collected data

Evaluation about 2 × 2 km around Tokyo Station Area Period 1 day Num. of collected tweets Common method using Streaming API 31,711 Common method using Search API 1,500 Proposed method 97,787

Practical issues for collecting large area and long period Access rate limitation to Search API per IP address • Connection is refused when the limit is exceeded. Unstability of the API (best effort service) • Without explicit error message, the number of tweets in Search API response often becomes much smaller than usual.

Solutions for practical issues (1 of 2) Data collection by distributed system • access the API from multiple servers with multiple different IP addresses Pilot data collection for monitoring Twitter API status • continuously monitor the number of tweets collected in a certain small grid cell to determine the status of the API • halt the data collection of the whole area when the number of collected data in the pilot data collection is much smaller than usual (smaller than 10% of the average), restart the data collection when the API returns stable.

Solutions for practical issues (2 of 2) Re-collection of data that the system failed to collect • check posted date time of collected tweets of each grid cell • If there are certain periods when tweets were not collected, try to collect the data for those periods again in the grid cell Repeat request when receiving an explicit API error

Distributed system for practical data collection Master server (1 machine) • Pilot data collection for monitoring Twitter API status • Getting and caching Date Boundary Tweet ID • Assigning collection areas and periods to data collection servers Data collection servers (multiple machines) • Data collection within assigned area and period • Data re-collection

Experiment and Result about 20 × 20 km around central Tokyo Area about 2 × 2 km Grid size Period 2 weeks (from 25 July 2011 0:00 JST) Num. of tweets 3,476,059 Num. of users 216,430

Daily variation (central Tokyo 20x20km) 200,000 180,000 160,000 Num. of Tweets 140,000 120,000 100,000 80,000 60,000 40,000 20,000 0 Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun • Weekday > Weekend

Daily variation (Odaiba area 2x2km, 1 day) 4,500 4,000 3,500 Num. of Tweets 3,000 2,500 2,000 1,500 1,000 500 0 Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun • A popular shopping and amusement area • Weekend > Weekday

Hourly variation (central Tokyo 20x20km, 2 days) 14,000 Largest (Thursday) 12,000 10,000 Num. of tweets 8,000 Smallest (Sunday) 6,000 4,000 2,000 0 0 2 4 6 8 10 12 14 16 18 20 22 hour • A small spike at around noon (lunch break?) • A large spike at midnight

Hourly variation (around Tokyo station 2x2km) 2,500 2,000 Num. of tweets 1,500 1,000 500 0 0 2 4 6 8 10 12 14 16 18 20 22 hour • Small spike around 4 AM corresponds to a small earthquake.

Number of geo-tweets per user (central Tokyo 20x20km, 2 weeks) 100,000 10,000 Num. of users 1,000 100 10 1 1 10 100 1,000 10,000 100,000 Num. of tweets • Most users posted fewer than 4 geo-tweets in 2 weeks.

Number of grid cells user posted geo-tweets (central Tokyo 20x20km, 2 weeks) 100,000 10,000 Num. of users 1,000 100 10 1 5 10 15 20 25 30 35 40 45 50 55 Num. of cells within which each user posted tweets • More than half of the users posted geo-tweets in at least two different cells. • One user posted geo-tweets in 56 different cells.

Conclusion • Distributed data collection system for geo-tweet • collected several times more data than commonly used methods • Spatio-temporal visualization system for geo-tweet Future plan • Scaling up the system • enlarge the area for collecting geo-tweet data • Integrating realtime data collection system • Data sharing system for researchers using geo-tweet data

Response of Twitter API (abstract) Tweet text Tweet ID User ID Destination user ID (optional) • only for tweets posted as replies to others (with “@user”) User profile (optional) • including location name input by the user Location coordinates (optional) • only for tweets tagged with the location coordinates (0.77%)

Types of Twitter API Streaming API • sends tweets continuously in realtime while connected by an API client Search API • returns a set of tweets that match a specified query when accessed by an API client To collect tweets within a specified area • Streaming API with location filter (geographic coordinates of an area) • Search API with location and period (from and to date) search filter

Location information of Twitter • Not all the tweets have location information Location coordinates (latitude, longitude) • attached only when the user opt in geotagging with the location coordinates • mostly from devices with GPS / Wi-Fi positioning systems • 0.77% of all tweets Location name in user profile • input by the users. Fake, joke, wrong name • Search API extract only tweets with “correct” location names Location name in tweet text • extracted by Natural Language Processing technique • not high accuracy at this moment (less than 50%)

Common method for collecting geo-tweet data continuously (1 of 2) Caching data in realtime by connecting Streaming API with location filter Advantage • collecting realtime data Disadvantage • The number of target tweets is relatively small. • cannot collect past data

Common method for collecting geo-tweet data continuously (2 of 2) Collecting data by accessing Search API at certain intervals with location and period search filter Advantage • The number of target tweets is relatively large. Disadvantage • The search period is limited to the 5 days before the current date. • impossible to collect all the tweets in areas where the number of tweets per day is over 1,500 • Search API Limitations: • The maximum number of tweets under one search condition: 1,500 • The minimum search area: 1 × 1 km • The minimum search period: 1 day

Diffusion of Retweet 2011 the sum of Heavy rain 2011 the sum of Heavy rain 11 consecutive warning issued for 11 consecutive warning issued for prime numbers. Tokyo. prime numbers. Tokyo.

Geo Twitter Data Collection and Visualization System Hideyuki Fujita - PowerPoint PPT Presentation

Geo Twitter Data Collection and Visualization System Hideyuki Fujita Graduate School of Information Systems, University of Electro-Communications (Tokyo, Japan) Backgrounds Mobile social media generating valuable data for analyzing human

GEO & Disaster Risk Reduction James Norris GEO Secretariat GEO in numbers Overview of GEO

Fields of Geo-Data and Blockchain Done by : Nancy Abu Halemah Aisah al Qayem GEO DATA GEODATA

Sunglasses SM001 Collection SM005 Collection YPC001 Collection(swimming goggles) SR001

Security Visualization Tim Vidas & Hanan Hibshi UPS 2011 1 Visualization Visualization can

Data Visualization Brait ispuu Types of Visualization Mathematical Visualization y =

A roadmap for geo-neutrinos: A roadmap for geo-neutrinos: theory and experiment theory and

GEO Programme Board & Work Plan (2017-19) Stefano Nativi (CNR-IIA) GEO Italy meeting ISPRA,

Geo-Strategy https://www.youtube.com/watch?v=5GvjVUrmgNU Geo-politics Geo-economics

Geo Sense Presentation Actions Geo Sense Actions What is it? How does it work? Before Geo

Status of GEO burst analysis efforts Ik Siong Heng for the GEO burst group Outline

ML in Geosciences Valentine et al. (2012, 2013) Examples in Geo Valentine & Trampert (2012)

Geographic Data Science - Lecture III (Geo-)Visualization Dani Arribas-Bel Today Visualization

Visualization Visualization Understand what ConvNets learn 2 Visualization The development of

Data Visualization Tools, How do you make a visualization? Is it the right visualization?

Geographic Data Science - Lecture III (Geo-)Visualization Dani Arribas-Bel Today

GEO-IV 28-29 November 2007 GEO-IV 28-29 November 2007 The Socio-Economic and Environmental

Practical Formal Verification of MPI and Thread Programs Ganesh Gopalakrishnan, School of

Optimized Schwarz Methods for Problems with Discontinuous Coefficients Olivier Dubois

Community based multi-group activity prediction and member identification Snigdha Das Indian

A Session Initiation Protocol (SIP) Load Control Event Package

NSRCI 2014 Maritime Strategic Surprise: Can an Emphasis on Resilience be the True Center of

THE UNIVERSE IS BIGGER THAN WE CAN IMAGINE About 2000 exoplanets have been discovered

Exoplanets: a dynamic field Alexander James Mustill Amy Bonsor, Melvyn B. Davies, Boris

OOlong: An Extensible Concurrent Object Calculus Elias Castegren Tobias Wrigstad Uppsala

Sambuz

Useful Links

Newsletter

Mail Us

Geo Twitter Data Collection and Visualization System Hideyuki Fujita - PowerPoint PPT Presentation

Geo Twitter Data Collection and Visualization System Hideyuki Fujita Graduate School of Information Systems, University of Electro-Communications (Tokyo, Japan) Backgrounds Mobile social media generating valuable data for analyzing human

GEO &amp; Disaster Risk Reduction James Norris GEO Secretariat GEO in numbers Overview of GEO

Fields of Geo-Data and Blockchain Done by : Nancy Abu Halemah Aisah al Qayem GEO DATA GEODATA

Sunglasses SM001 Collection SM005 Collection YPC001 Collection(swimming goggles) SR001

Security Visualization Tim Vidas &amp; Hanan Hibshi UPS 2011 1 Visualization Visualization can

Data Visualization Brait ispuu Types of Visualization Mathematical Visualization y =

A roadmap for geo-neutrinos: A roadmap for geo-neutrinos: theory and experiment theory and

GEO Programme Board &amp; Work Plan (2017-19) Stefano Nativi (CNR-IIA) GEO Italy meeting ISPRA,

Geo-Strategy https://www.youtube.com/watch?v=5GvjVUrmgNU Geo-politics Geo-economics

Geo Sense Presentation Actions Geo Sense Actions What is it? How does it work? Before Geo

Status of GEO burst analysis efforts Ik Siong Heng for the GEO burst group Outline

ML in Geosciences Valentine et al. (2012, 2013) Examples in Geo Valentine &amp; Trampert (2012)

Geographic Data Science - Lecture III (Geo-)Visualization Dani Arribas-Bel Today Visualization

Visualization Visualization Understand what ConvNets learn 2 Visualization The development of

Data Visualization Tools, How do you make a visualization? Is it the right visualization?

Geographic Data Science - Lecture III (Geo-)Visualization Dani Arribas-Bel Today

GEO-IV 28-29 November 2007 GEO-IV 28-29 November 2007 The Socio-Economic and Environmental

Practical Formal Verification of MPI and Thread Programs Ganesh Gopalakrishnan, School of

Optimized Schwarz Methods for Problems with Discontinuous Coefficients Olivier Dubois

Community based multi-group activity prediction and member identification Snigdha Das Indian

A Session Initiation Protocol (SIP) Load Control Event Package

NSRCI 2014 Maritime Strategic Surprise: Can an Emphasis on Resilience be the True Center of

THE UNIVERSE IS BIGGER THAN WE CAN IMAGINE About 2000 exoplanets have been discovered

Exoplanets: a dynamic field Alexander James Mustill Amy Bonsor, Melvyn B. Davies, Boris

OOlong: An Extensible Concurrent Object Calculus Elias Castegren Tobias Wrigstad Uppsala

Sambuz

Useful Links

Newsletter

Mail Us

GEO & Disaster Risk Reduction James Norris GEO Secretariat GEO in numbers Overview of GEO

Security Visualization Tim Vidas & Hanan Hibshi UPS 2011 1 Visualization Visualization can

GEO Programme Board & Work Plan (2017-19) Stefano Nativi (CNR-IIA) GEO Italy meeting ISPRA,

ML in Geosciences Valentine et al. (2012, 2013) Examples in Geo Valentine & Trampert (2012)