Streaming Tensor Factorization for Infinite Data Sources Shaden - PowerPoint PPT Presentation

Streaming Tensor Factorization for Infinite Data Sources Shaden Smith - Intel Parallel Computing Lab Kejun Huang - University of Minnesota Nicholas D. Sidiropoulos - University of Virginia George Karypis - University of Minnesota Shaden.Smith@intel.com

Tensor factorization • Multi-way data can be naturally represented as a tensor . • Tensor factorizations are powerful tools for facilitating the analysis of multi-way data. • Think: singular value decomposition, principal component analysis. Port Port Source IP Destination IP Source IP Destination IP Canonical Polyadic Decomposition

Streaming data • We often need to analyze multi-way data that is streamed . • Applications include: cybersecurity, discussion tracking, traffic analysis, video monitoring, … • A batch of data arrives each timestep 1, …, T. • T may be infinite! • Batches are assumed to come from the same generative model. • In practice, we must account for the model slowly changing over time. Source IP Source IP Source IP ... Port Port Port Destination IP Destination IP Destination IP Time 1 Time 2 Time T

Streaming tensor factorization • The collection of N -dimensional tensors can be viewed as an (N+1) - dimensional tensor observed over time. • We want to cheaply update an existing factorization each timestep to incorporate the latest batch of data. • Challenge: storing historical tensor or factorization data that grows with time is infeasible. • Challenge: we would like to apply constraints such as non-negativity to the factorization. T T

CP-stream: optimization problem • We start from the following non-convex optimization problem over all timesteps: • We constrain the factor matrices to have column norms ≤ 1 . • This improves stability due to a scaling ambiguity in the CPD. • The # $ ∈ ℝ ' vectors form the rows of ( , the temporal factor matrix.

CP-stream: formulation • To avoid storing historic tensor data, we follow (Vandecappelle et al. 2017) and instead use the historical factorization: • ! is a forgetting factor used to down-weight the importance of older data. • Limitation: this still requires " ∈ ℝ % × ' .

CP-stream: algorithm (details in paper/poster) When a new batch of data arrives at time ! : 1. Compute " # . This has a closed-form solution involving the new batch of tensor data and the previous factor matrices. • Complexity does not depend on T. 2. Update the factor matrices. We use alternating optimization with ADMM (AO-ADMM; Huang & Sidiropoulos 2016). • The temporal factor $ is only used in its compact Gramian form $ % $ , which is computed recursively:

Extensions • CP-stream supports additional constraints/regularizations. For stability, they are combined with the column norm constraint ( proof of convergence in paper ). • Non-negativity • ℓ " regularization to promote sparse factors • Tensor sparsity: • CP-stream scales linearly in the number of non-zeros and makes use of the existing optimized kernels. • Sparsity is not treated as missing , because absence of activity also carries meaning in our applications.

Evaluation • We generated a dense 10 10 100x100x1000 tensor from rank- Online-CP Scaled estimation error Online-SGD 10 factors (plus noise). CP-stream 10 5 • We compare against: • Online-CP (Zhou et al., 2016) • Online-SGD (Mardani et al., 2015) 10 0 • Shown is the estimation error of the known ground-truth factors: 10 -5 0 200 400 600 800 1000 t

Case study: discussion tracking • Comments on reddit.com form a ( user, community, word) tensor. • A new batch arrives each day. • 65M non-zeros over one year. • Each user, community, and word are represented by a low-rank vector in the factorization. • Tracking the vectors representing the word “Obama” and the stocks community reveals events in 2008.

Wrapping up • Streaming tensor factorization has applications in areas such as cybersecurity, discussion tracking, and traffic analysis. • CP-stream uses a formulation suitable for long-term streaming, and supports sparsity and constraints. • Our source code is to be open sourced as part of SPLATT • https://github.com/ShadenSmith/splatt • Sparse tensor datasets available in FROSTT: • http://frostt.io/ • Contact: Shaden.Smith@intel.com or shaden@cs.umn.edu

Backup

AO-ADMM

AO-ADMM (2)

Streaming Tensor Factorization for Infinite Data Sources Shaden - PowerPoint PPT Presentation

Streaming Tensor Factorization for Infinite Data Sources Shaden Smith - Intel Parallel Computing Lab Kejun Huang - University of Minnesota Nicholas D. Sidiropoulos - University of Virginia George Karypis - University of Minnesota

Infinite Positive Semidefinite Tensor Factorization for Source Separation of Mixture Signals

Tensor Factorization via Matrix Factorization Volodymyr Kuleshov Arun Tejasvi Chaganty Percy

8. Tensor Field Visualization Tensor: extension of concept of scalar and vector Tensor data

Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges Shaden Smith &

A Medium-Grained Algorithm for Distributed Sparse Tensor Factorization Shaden Smith George

Infinite graphs P eter Komj ath LC12 P eter Komj ath Infinite graphs Infinite

(Some) Challenges in (Some) Challenges in Tensor Mining Tensor Mining Evrim Acar Sandia

Tensor Field Techniques Lecture 11 March 5, 2020 Outline Basics of tensor algebra Tensor

TENSOR ALGEBRA Continuum Mechanics Course (MMC) - ETSECCPB - UPC Introduction to Tensors Tensor

Tensor-Matrix Products with a Compressed Sparse Tensor Shaden Smith George Karypis University

Tensor Field Visualization 9-1 Ronald Peikert SciVis 2007 - Tensor Fields Tensors

The waveguide eigenvalue problem and Giampaolo Tensor infinite Arnoldi Mele Giampaolo Mele KTH

PROGRAMMING TENSOR CORES: NATIVE VOLTA TENSOR CORES WITH CUTLASS Andrew Kerr, Timmy Liu, Mostafa

Training Presentation Web Streaming Introduction What is Web Streaming? Who is Streaming?

20 STREAMING AGREEMENT 19 16 OCTOBER US$145 million Streaming Agreement US$145 million

2 Workloa d? 3 OLTP 4 OLAP OLTP 4 OLAP OLTP Streaming 4 Scan- OLAP OLTP Streaming

Countability Jason Filippou CMSC250 @ UMCP 06-23-2016 Jason Filippou (CMSC250 @ UMCP)

Learning systems of concepts with an Infinite Relational Model Charles Kemp, 1 Josh Tenenbaum, 1

Analytic Infinite Derivative (AID) field theories Alexey Koshelev Universidade da Beira Interior,

Horizon hair of extremal black holes and measurements at null infinity Stefanos Aretakis (joint

Logarithms Are Not Infinity: This Infinity Problem . . . A Rational Physics-Related In Reality,

Reinforcement Learning You can think of supervised learning as the teacher providing answers

I nsulation I nsulation I nsulation I nsulation Why I nsulate? Why I nsulate? - - Thermal

LHQ Program & Nb 3 Sn Technology Demonstration Giorgio Ambrosio Fermilab LARP DOE review

Streaming Tensor Factorization for Infinite Data Sources Shaden - PowerPoint PPT Presentation

Streaming Tensor Factorization for Infinite Data Sources Shaden Smith - Intel Parallel Computing Lab Kejun Huang - University of Minnesota Nicholas D. Sidiropoulos - University of Virginia George Karypis - University of Minnesota

Infinite Positive Semidefinite Tensor Factorization for Source Separation of Mixture Signals

Tensor Factorization via Matrix Factorization Volodymyr Kuleshov Arun Tejasvi Chaganty Percy

8. Tensor Field Visualization Tensor: extension of concept of scalar and vector Tensor data

Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges Shaden Smith &amp;

A Medium-Grained Algorithm for Distributed Sparse Tensor Factorization Shaden Smith George

Infinite graphs P eter Komj ath LC12 P eter Komj ath Infinite graphs Infinite

(Some) Challenges in (Some) Challenges in Tensor Mining Tensor Mining Evrim Acar Sandia

Tensor Field Techniques Lecture 11 March 5, 2020 Outline Basics of tensor algebra Tensor

TENSOR ALGEBRA Continuum Mechanics Course (MMC) - ETSECCPB - UPC Introduction to Tensors Tensor

Tensor-Matrix Products with a Compressed Sparse Tensor Shaden Smith George Karypis University

Tensor Field Visualization 9-1 Ronald Peikert SciVis 2007 - Tensor Fields Tensors

The waveguide eigenvalue problem and Giampaolo Tensor infinite Arnoldi Mele Giampaolo Mele KTH

PROGRAMMING TENSOR CORES: NATIVE VOLTA TENSOR CORES WITH CUTLASS Andrew Kerr, Timmy Liu, Mostafa

Training Presentation Web Streaming Introduction What is Web Streaming? Who is Streaming?

20 STREAMING AGREEMENT 19 16 OCTOBER US$145 million Streaming Agreement US$145 million

2 Workloa d? 3 OLTP 4 OLAP OLTP 4 OLAP OLTP Streaming 4 Scan- OLAP OLTP Streaming

Countability Jason Filippou CMSC250 @ UMCP 06-23-2016 Jason Filippou (CMSC250 @ UMCP)

Learning systems of concepts with an Infinite Relational Model Charles Kemp, 1 Josh Tenenbaum, 1

Analytic Infinite Derivative (AID) field theories Alexey Koshelev Universidade da Beira Interior,

Horizon hair of extremal black holes and measurements at null infinity Stefanos Aretakis (joint

Logarithms Are Not Infinity: This Infinity Problem . . . A Rational Physics-Related In Reality,

Reinforcement Learning You can think of supervised learning as the teacher providing answers

I nsulation I nsulation I nsulation I nsulation Why I nsulate? Why I nsulate? - - Thermal

LHQ Program &amp; Nb 3 Sn Technology Demonstration Giorgio Ambrosio Fermilab LARP DOE review

Sparse Tensor Factorization: Algorithms, Data Structures, and Challenges Shaden Smith &

LHQ Program & Nb 3 Sn Technology Demonstration Giorgio Ambrosio Fermilab LARP DOE review