Summary Extraction on Data Streams in Embedded Systems Sebastian - PowerPoint PPT Presentation

Summary Extraction on Data Streams in Embedded Systems Sebastian Buschj¨ ager and Katharina Morik TU Dortmund University - Computer Science - Artificial Intelligence Group September 18, 2017 1

So... IoT hype?! 2016 Ericsson Maritime ICT connects over 350 cargo vessels on one freighter Summary Extraction on Data Streams in Embedded Systems 2

So... IoT hype?! 2016 Daimler Trucks has deployed 400000 trucks with 400 sensors each Summary Extraction on Data Streams in Embedded Systems 3

So... IoT hype?! 2016 Virgin Atlantic announces fleet of fully connected Boeing 787 machines and cargo Summary Extraction on Data Streams in Embedded Systems 4

IoT means large autonomous systems Common intuition There will be more devices We will get more data Systems will become more autonomous Summary Extraction on Data Streams in Embedded Systems 5

IoT means large autonomous systems Common intuition There will be more devices We will get more data Systems will become more autonomous Question What to do if something unexpected happens? Summary Extraction on Data Streams in Embedded Systems 5

Goal Monitor systems Clear Nobody can monitor all the sensor data on the fly But To detect unexpected behavior we need to monitor all data Summary Extraction on Data Streams in Embedded Systems 6

Goal Monitor systems Clear Nobody can monitor all the sensor data on the fly But To detect unexpected behavior we need to monitor all data Idea Compute summaries on the fly while sensor data is generated Summary Extraction on Data Streams in Embedded Systems 6

Goal Monitor systems Then Human expert can inspect summaries Perform operations on summary etc. Summary Extraction on Data Streams in Embedded Systems 7

Goal Monitor systems Then Human expert can inspect summaries Perform operations on summary etc. Constraint Different data types + theoretically sound Summary Extraction on Data Streams in Embedded Systems 7

Data summarization Some theory Intuition Use set function f to measures expressiveness of summary S Goal S ⊆ V, | S |≤ k f ( S ) max Summary Extraction on Data Streams in Embedded Systems 8

Data summarization Some theory Intuition Use set function f to measures expressiveness of summary S Goal S ⊆ V, | S |≤ k f ( S ) max Gain Let f : V → R and let e ∈ V and S ⊆ V : ∆ f ( e | S ) = f ( S ∪ { e } ) − f ( S ) Summary Extraction on Data Streams in Embedded Systems 8

Summarization Sieve-Streaming Badanidiyuru et al. 2014 Sieve-Streaming Item e arrives one at a time Immediately decide if e should be added to summary Summary Extraction on Data Streams in Embedded Systems 9

Summarization Sieve-Streaming Badanidiyuru et al. 2014 Sieve-Streaming Item e arrives one at a time Immediately decide if e should be added to summary Idea Introduce novelty threshold v . Add e if ∆ f ( e | S ) > v Summary Extraction on Data Streams in Embedded Systems 9

Summarization Sieve-Streaming Badanidiyuru et al. 2014 Sieve-Streaming Item e arrives one at a time Immediately decide if e should be added to summary Idea Introduce novelty threshold v . Add e if ∆ f ( e | S ) > v Challenge What is the “optimal” v ? Summary Extraction on Data Streams in Embedded Systems 9

Summarization Sieve-Streaming Idea Manage multiple summaries i = 1 , 2 , 3 . . . with multiple v i → “sieve” out unimportant elements Summary Extraction on Data Streams in Embedded Systems 10

Summarization Sieve-Streaming Idea Manage multiple summaries i = 1 , 2 , 3 . . . with multiple v i → “sieve” out unimportant elements By sumodularity v i ∈ [ m, km ] with m = max e ∈ V f ( { e } ) Then solution is 1 2 − ε approximation Note This is independent from f Summary Extraction on Data Streams in Embedded Systems 10

Submodular maximization The right function Question What submodular function f captures summarization? Herbrich et al. 2003 / Seeger 2004 Informative Vector Machine f ( S ) = 1 I + σ − 2 Σ S � � 2 log det Summary Extraction on Data Streams in Embedded Systems 11

Submodular maximization The right function Question What submodular function f captures summarization? Herbrich et al. 2003 / Seeger 2004 Informative Vector Machine f ( S ) = 1 I + σ − 2 Σ S � � 2 log det kernel matrix K = [ k ( e i , e j )] i,j k × k identiy matrix Summary Extraction on Data Streams in Embedded Systems 11

IVM for data summarization Since we know f , reduce interval! Note Assume k ( e i , e i ) = 1 Least-expressive summary All off-diagonal elements are 1 Summary Extraction on Data Streams in Embedded Systems 12

IVM for data summarization Since we know f , reduce interval! Note Assume k ( e i , e i ) = 1 Least-expressive summary All off-diagonal elements are 1 1 = 1 I + σ − 2 Σ S I + σ − 2 11 T � � � � f ( S ) = 2 log det 2 log det Summary Extraction on Data Streams in Embedded Systems 12

IVM for data summarization Since we know f , reduce interval! Note Assume k ( e i , e i ) = 1 Least-expressive summary All off-diagonal elements are 1 1 = 1 I + σ − 2 Σ S I + σ − 2 11 T � � � � f ( S ) = 2 log det 2 log det 1 = 1 1 + σ − 2 1 T 1 1 + σ − 2 k � � � � = 2 log 2 log Summary Extraction on Data Streams in Embedded Systems 12

IVM for data summarization Since we know f , reduce interval! Note Assume k ( e i , e i ) = 1 Most-expressive summary All off-diagonal elements are 0 Summary Extraction on Data Streams in Embedded Systems 13

IVM for data summarization Since we know f , reduce interval! Note Assume k ( e i , e i ) = 1 Most-expressive summary All off-diagonal elements are 0 1 = 1 I + σ − 2 Σ S I (1 + σ − 2 ) � � � � f ( S ) = 2 log det 2 log det Summary Extraction on Data Streams in Embedded Systems 13

IVM for data summarization Since we know f , reduce interval! Note Assume k ( e i , e i ) = 1 Most-expressive summary All off-diagonal elements are 0 1 = 1 I + σ − 2 Σ S I (1 + σ − 2 ) � � � � f ( S ) = 2 log det 2 log det 1 = k � (1 + σ − 2 ) k det ( I ) � 1 + σ − 2 k � � = 2 log 2 log Summary Extraction on Data Streams in Embedded Systems 13

Sieve-Streaming enhancements Result Number of sieves reduced without performance loss Summary Extraction on Data Streams in Embedded Systems 14

Sieve-Streaming enhancements Result Number of sieves reduced without performance loss 2 log(1 + σ − 2 ) , k 2 log(1 + σ − 2 )] Default v i ∈ [ 1 Reduced v i ∈ [ 1 2 log(1 + kσ − 2 ) , k 2 log(1 + σ − 2 )] Summary Extraction on Data Streams in Embedded Systems 14

Sieve-Streaming enhancements Result Number of sieves reduced without performance loss 2 log(1 + σ − 2 ) , k 2 log(1 + σ − 2 )] Default v i ∈ [ 1 Reduced v i ∈ [ 1 2 log(1 + kσ − 2 ) , k 2 log(1 + σ − 2 )] More improvements Reopen sieves once full Sieves with small threshold will quickly be full Save summary, and reopen sieve with larger threshold Summary Extraction on Data Streams in Embedded Systems 14

Sieve-Streaming enhancements Result Number of sieves reduced without performance loss 2 log(1 + σ − 2 ) , k 2 log(1 + σ − 2 )] Default v i ∈ [ 1 Reduced v i ∈ [ 1 2 log(1 + kσ − 2 ) , k 2 log(1 + σ − 2 )] More improvements Reopen sieves once full Sieves with small threshold will quickly be full Save summary, and reopen sieve with larger threshold ⇒ Increase utility value with same number of sieves Summary Extraction on Data Streams in Embedded Systems 14

Experiments Questions Question 1 Are summaries with IVM really expressive? Summary Extraction on Data Streams in Embedded Systems 15

Experiments Questions Question 1 Are summaries with IVM really expressive? → Summaries should contain “hidden” states of data → Extract summary of classification task → Then each class represents one “hidden” state Summary Extraction on Data Streams in Embedded Systems 15

Experiments Questions Question 1 Are summaries with IVM really expressive? → Summaries should contain “hidden” states of data → Extract summary of classification task → Then each class represents one “hidden” state Question 2 How perform enhancements compared to default? Summary Extraction on Data Streams in Embedded Systems 15

Experiments Data Synthetic data GMM with 4 dimensions and 4 classes. Use � −|| e i − e j || 2 � K = 10 , . . . , 24 , ε = 0 . 1 , σ = 1 , k ( e i , e j ) = exp 2 10 UJIndoor Location Predict (semantic) location, e.g. room number based on GPS. Use � −|| e i − e j || 2 � K = 80 , . . . , 130 , ε = 0 . 1 , σ = 1 , k ( e i , e j ) = exp 2 0 . 005 MNIST Handwritten digit recognition task. Use � −|| e i − e j || 2 � K = 8 , . . . , 16 , ε = 0 . 1 , σ = 1 , k ( e i , e j ) = exp 2 784 Summary Extraction on Data Streams in Embedded Systems 16

Summary Extraction on Data Streams in Embedded Systems Sebastian - PowerPoint PPT Presentation

Summary Extraction on Data Streams in Embedded Systems Sebastian Buschj ager and Katharina Morik TU Dortmund University - Computer Science - Artificial Intelligence Group September 18, 2017 1 So... IoT hype?! 2016 Ericsson Maritime ICT

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

EMBEDDED EMBEDDED REAL TIME SYSTEMS REAL TIME SYSTEMS EMBEDDED EMBEDDED REAL TIME SYSTEMS

WITH C++ Prof. Amr Goneid AUC Part 9. Streams & Files Prof. amr Goneid, AUC 1 Streams

Declarative Information Extraction Declarative Information Extraction Using Datalog Datalog with

Stream Algorithmics Albert Bifet March 2012 Data Streams Big Data & Real Time Data Streams

Environmental Health Science Data Streams Data Streams Health Data Health Data Brian S.

Soil Extraction Cell: An Alternative Soil Extraction Cell: An Alternative Method of Soil

Embedded PC The modular Industrial PC for mid-range control Embedded PC 1 Embedded OS

Data Streams Many large sources of data are generated as streams of updates: IP Network

Data Streams Many large sources of data are generated as streams of updates: IP Network

Stream Bank Stabilization in Open Space Streams in open space There are approximately 35

CSE 143 Streams as C++ Classes Streams are C++ classes Streams have lots of built-in

Platform Convergence Journey Windows Embedded Standard 7 Windows Embedded Standard 8 Converged

HW/SW Codesign w/ FPGAs Embedded Systems ECE 495/595 Overview (Slides from Embedded Systems

Embedded Embedded Architecture Architecture Systems Systems Jakob Engblom, PhD Jakob

Comparing Data Streams Using Hamming Norms Graham Cormode, Mayur Datar, Piotr Indyk, S.

Dialogue Summarization Presenter: Wang Chen Mentor: Piji Li 1 Outline Introduction Task

Machine Learning Kelly Rivers and Stephanie Rosenthal 15-110 Fall 2019 Data Analysis What kind

Lecture 3: SPARQL (1.1) Aidan Hogan aidhog@gmail.com PREVIOUSLY First SPARQL (1.0) Then

Question Answering & the Semantic Web Gnter Neumann Language Technology-Lab DFKI,

Summarizing Long First-Person Videos Kristen Grauman Department of Computer Science University

Rearranging and manipulating data Dr. Nomie Becker Dr. Sonja Grath Special thanks to : Dr.

Improving Neural Abstractive Text Summarization with Prior Knowledge Gaetano Rossiello , Pierpaolo

Antiretroviral Therapy Initiation: From Guidelines to Practice: ART 101 Medical Care of

Summary Extraction on Data Streams in Embedded Systems Sebastian - PowerPoint PPT Presentation

Summary Extraction on Data Streams in Embedded Systems Sebastian Buschj ager and Katharina Morik TU Dortmund University - Computer Science - Artificial Intelligence Group September 18, 2017 1 So... IoT hype?! 2016 Ericsson Maritime ICT

uf: Minimizing the Coq Extraction TCB Eric Mullen , Stuart Pernsteiner, James Wilcox, Zachary

EMBEDDED EMBEDDED REAL TIME SYSTEMS REAL TIME SYSTEMS EMBEDDED EMBEDDED REAL TIME SYSTEMS

WITH C++ Prof. Amr Goneid AUC Part 9. Streams &amp; Files Prof. amr Goneid, AUC 1 Streams

Declarative Information Extraction Declarative Information Extraction Using Datalog Datalog with

Stream Algorithmics Albert Bifet March 2012 Data Streams Big Data &amp; Real Time Data Streams

Environmental Health Science Data Streams Data Streams Health Data Health Data Brian S.

Soil Extraction Cell: An Alternative Soil Extraction Cell: An Alternative Method of Soil

Embedded PC The modular Industrial PC for mid-range control Embedded PC 1 Embedded OS

Data Streams Many large sources of data are generated as streams of updates: IP Network

Data Streams Many large sources of data are generated as streams of updates: IP Network

Stream Bank Stabilization in Open Space Streams in open space There are approximately 35

CSE 143 Streams as C++ Classes Streams are C++ classes Streams have lots of built-in

Platform Convergence Journey Windows Embedded Standard 7 Windows Embedded Standard 8 Converged

HW/SW Codesign w/ FPGAs Embedded Systems ECE 495/595 Overview (Slides from Embedded Systems

Embedded Embedded Architecture Architecture Systems Systems Jakob Engblom, PhD Jakob

Comparing Data Streams Using Hamming Norms Graham Cormode, Mayur Datar, Piotr Indyk, S.

Dialogue Summarization Presenter: Wang Chen Mentor: Piji Li 1 Outline Introduction Task

Machine Learning Kelly Rivers and Stephanie Rosenthal 15-110 Fall 2019 Data Analysis What kind

Lecture 3: SPARQL (1.1) Aidan Hogan aidhog@gmail.com PREVIOUSLY First SPARQL (1.0) Then

Question Answering &amp; the Semantic Web Gnter Neumann Language Technology-Lab DFKI,

Summarizing Long First-Person Videos Kristen Grauman Department of Computer Science University

Rearranging and manipulating data Dr. Nomie Becker Dr. Sonja Grath Special thanks to : Dr.

Improving Neural Abstractive Text Summarization with Prior Knowledge Gaetano Rossiello , Pierpaolo

Antiretroviral Therapy Initiation: From Guidelines to Practice: ART 101 Medical Care of

WITH C++ Prof. Amr Goneid AUC Part 9. Streams & Files Prof. amr Goneid, AUC 1 Streams

Stream Algorithmics Albert Bifet March 2012 Data Streams Big Data & Real Time Data Streams

Question Answering & the Semantic Web Gnter Neumann Language Technology-Lab DFKI,