responsive analytics of highly connected big data

Responsive Analytics of Highly-Connected Big Data Dr. Peter - PowerPoint PPT Presentation

Responsive Analytics of Highly-Connected Big Data Dr. Peter Janacik, Stream Reasoning Workshop 2016 TU Berlin, December 8, 2016 Linking concepts Graphs Content streams occurs in concept

  1. Responsive Analytics of Highly-Connected Big Data Dr. Peter Janacik, Stream Reasoning Workshop 2016 – TU Berlin, December 8, 2016

  2. Linking concepts Graphs Content streams occurs in concept 
 a ff inity favored/ 
 commented Data from social networks posted (Twitter/Instagram), Wikipedia, Wiktionary, evocation/ 
 synonym databases, medical knowledge, etc. Users 2

  3. Extracting sense from tokens • Approach Lemma Sense • Symbol/token: notion in general, oriented gradient Lemma Sense cluster, recognized object, word • Semantic overlay absorbs flow Lemma Sense of meaning • Follow-up processing Lemma Sense • Polarization and gist extraction • Wikipedia/wiktionary as Lemma Sense knowledge model for semantic overlay Lemma Sense A fu er patiently waiting , a black cat Symbol/token layer 3

  4. Apache Flink • Was initiated at TU Berlin (first under the name Stratosphere) E • MapReduce does not provide su ff icient means to implement state-of-the-art analysis methods • Flink allows to connect transformations D D (vertices) to a graph using data streams Degree of (directed edges) parallelism is adjustable, • Distributed execution and placement within a C here 2 cluster: Flink program -> subtasks -> slots • Number of slots on one physical node is A B configurable but usually it is equal to number of cores • Maximum degree of parallelism can be defined and is used by Flink during execution 4

  5. Future Work • Partitioning of data flow graph Cut a E over several data centers based E, F, B, Z in data center 1 on available resources, data stream bandwidth, data privacy F D criteria • Optimization criteria Data streams • Min processing time C B with di ff erent bandwidth • Min costs • Best fit Cut b A Z • Matching di ff erent criteria • Dynamic migration in order to D, C, A in data accommodate changing center 2 characteristics of physical topology (available bandwidth/ resources (nodes), price, follow the sun, etc.) 5

  6. Interdependence of areas Visualization Comprehensible Data/results presentation Results/ recommendation to trigger Human data/ Concept/ feedback relationship/story generation detection Alters models/algo behavior Enabled by Enabled by Distribution at web- scale for insightful analysis 6

  7. Semantic graph as result of analysis 7

  8. Semantic graph as result of analysis 8

  9. Instagram interaction heat map 9

  10. Instagram interaction heat map 10

  11. Semantic graph as result of analysis But how to visualize, what these graphs are about, 
 when there are typically millions to trillions of edges? 11

  12. Approaches to visualization • 2-dimensional • Works well with most of the currently available devices, no special hardware needed • Supported by broad range of platforms • Less complex, easier to implement • Fewer problems with readability and overlap 12

  13. Approaches to visualization • 3-dimensional • Chance to make use of additional dimension to untangle the big graph • Di ff erent perspectives may cover di ff erent aspects/lead to di ff erent conclusions • Can exploit the full potential of touch interfaces 13

  14. Responsive Analytics of Highly-Connected Big Data Dr. Peter Janacik, Stream Reasoning Workshop 2016 – TU Berlin, December 8, 2016


More recommend