refactoring earthquake tsunami causality and messaging
play

Refactoring Earthquake-Tsunami Causality and Messaging via Big Data - PowerPoint PPT Presentation

Refactoring Earthquake-Tsunami Causality and Messaging via Big Data Analytics: The Transformative Potential of Credible Tweets L. I. Lumb 1,2 & J. R. Freemantle 3 1 York University, 2 Univa Corporation & 3 Independent MCBDA 2016 (First


  1. Refactoring Earthquake-Tsunami Causality and Messaging via Big Data Analytics: The Transformative Potential of Credible Tweets L. I. Lumb 1,2 & J. R. Freemantle 3 1 York University, 2 Univa Corporation & 3 Independent MCBDA 2016 (First Workshop) PVAMU, May 17, 2016

  2. Agenda ● Motivation ● Traditional Data ● Social-Networking Data ○ Graphs, Semantics & Machine Learning ● Conclusions

  3. Geist, E.L., Titov, V.V., and Synolakis, C.E., 2006, Tsunami: wave of change: Scientific American, v. 294, p. 56-63

  4. Motivation ● Non-deterministic cause ○ Uncertainty inherent in any attempt to predict earthquakes ■ In situ measurements may reduce uncertainty ● Lead times ○ Availability of actionable observations ○ Communication of situation - advisories, warnings, etc. ● Cause-effect relationship ○ Energy transfer - inputs ... coupling ... outputs ■ ‘Geometry’ - bathymetry and topography ○ Other factors - e.g., tides ● Established effect ○ Far-field estimates of tsunami propagation (pre-computed) and coastal inundation (real-time) have proven to be extremely accurate ... requires ● Distributed array of deep-ocean tsunami detection buoys + forecasting model

  5. Agenda ● Motivation ● Traditional Data ● Social-Networking Data ○ Graphs, Semantics & Machine Learning ● Conclusions

  6. http://www.gitews.org/en/concept/

  7. http://www.eas.slu.edu/GGP/images/igrav2.jpg

  8. Lumb & Aldridge, http://dx.doi.org/10.1109/HPCS.2006.26

  9. Agenda ● Motivation ● Traditional Data ● Social-Networking Data ○ Graphs, Semantics & Machine Learning ● Conclusions

  10. 6Vs: Scientific vs. Social Networking Data GGP Scientific Data Twitter SN Data Volume small, finite BIG, ‘infinite’ Variety semi-structured, restricted unstructured, unrestricted - except for IDs, hashtags & URLs (pages, images) Velocity slow, sampled fast, streamed Veracity biases, noise & abnormalities Validity accuracy & correctness Volatility low (stationary, irreplaceable) high? (mobile?, disposable?) http://insidebigdata.com/2013/09/12/beyond-volume-variety-velocity-issue-big-data-veracity/

  11. Machine Learning Pipeline Karau et al., Learning Spark, O’Reilly, 2015

  12. Deep Learning from Twitter? Represent data ● Twitter data manually curated into ‘ham’ and ‘spam’ ● In-memory representation via Spark RDDs Extract features ● Frequency-based usage via Spark MLlib HashingTF ⇒ feature vectors Develop model object ● Spark MLlib LogisticRegressionWithSGD used for classification Evaluate model

  13. Future Work ● Machine Learning ○ Classification algorithms ... with categories? ○ Training Experiments ■ Larger data sets ■ Degrees of ‘hammyness’ ■ Stop-word removal, stemming, ... ○ Real-time streaming - data from Twitter ● Multiparameter credibility - TweetCred + ML + RDF/OWL GA ● Cloud-native platform ○ Containerization, dynamic scheduling and micro services ● Other examples ○ Alberta wildfires ○ Industrial incidents ○ Hurricanes

  14. Agenda ● Motivation ● Traditional Data ● Social-Networking Data ○ Graphs, Semantics & Machine Learning ● Conclusions

  15. Conclusions ● Credible tweets could be transformative ○ Mission-critical Big Data complement to existing data sources and approaches ● Current challenges/opportunities ○ Twitter Data ■ Extraction - only 100 tweets at a time (!!!) ■ Curation - manual (read: time consuming!!!) ○ Emphasizing Machine Learning ... appears encouraging, BUT ... ■ Graph Analytics ... as well ??? ■ Semantics ... as well ???

  16. Q&A L. I. Lumb 1,2 & J. R. Freemantle 3 1 ianlumb@yorku.ca, 2 ilumb@univa.com & 3 james. freemantle@rogers.com

  17. Problem Analytics Graph http://www.jma.go. jp/jma/en/2016_Kumamoto_Earthquake/2016_Kumamoto_Earthq uake.html

  18. Perl script prototype ● Acquires tweets with the keyword “earthquake” use Net::Twitter::Lite::WithAPIv1_1; my $nt = Net::Twitter::Lite::WithAPIv1_1->new( consumer_key => 'xxxx...xxxxxxx', consumer_secret => 'xxxxxx.....xxxxxxxxxx', access_token => 'xxxxx....xxxxxxxxxxx', access_token_secret => 'xxxxx.....xxxxxxxxxxx', ssl => 1 ); my $result = $nt->search("earthquake"); for my $status(@{$result->{statuses}} ) { print "$status->{text}\n"; }

  19. Resilient Distributed Datasets (RDDs) ● Abstraction for in-memory computing ● Fault-tolerant, parallel data structures o Cluster-ready ● Optionally persistent ● Can be partitioned for optimal placement ● Manipulated via operators Zaharia et al., NSDI 2012

Recommend


More recommend