Algorithms for Big Data (VI) Chihao Zhang Shanghai Jiao Tong - PowerPoint PPT Presentation

Algorithms for Big Data (VI) Chihao Zhang Shanghai Jiao Tong University Oct. 25, 2019 Algorithms for Big Data (VI) 1/13

Review . Algorithms for Big Data (VI) . Output ; , On input from a -universal family; Pick log We learnt AMS algorithm to estimate f log costs An ad-hoc algorithm for f bits. log log using for 2/13

Review . Algorithms for Big Data (VI) . Output ; , On input from a -universal family; Pick log log costs An ad-hoc algorithm for f bits. 2/13 ( ) kn 1 − 1/ k ( log m + log n ) We learnt AMS algorithm to estimate ∥ f ∥ k k for k ≥ 2 using O

Review from a -universal family; Algorithms for Big Data (VI) . Output ; , On input Pick 2/13 bits. ( ) kn 1 − 1/ k ( log m + log n ) We learnt AMS algorithm to estimate ∥ f ∥ k k for k ≥ 2 using O An ad-hoc algorithm for ∥ f ∥ 2 2 costs O ( log m + log n ) .

bits. Review Algorithms for Big Data (VI) 2/13 ( ) kn 1 − 1/ k ( log m + log n ) We learnt AMS algorithm to estimate ∥ f ∥ k k for k ≥ 2 using O An ad-hoc algorithm for ∥ f ∥ 2 2 costs O ( log m + log n ) . ▶ Pick h : [ n ] → {− 1 , 1 } from a 4 -universal family; ▶ On input ( j , ∆) , x ← x + ∆ · h ( j ) ; ▶ Output x 2 .

x is close to that of f ! An Algebraic View f , we know that E Algorithms for Big Data (VI) The 2-norm of the vector . x . Our algorithm outputs Let x It is instructive to view the Tug-of-War algorithm from linear algebra. . where Consider the matrix . function times (to apply the averaging trick), each time with Assume that we run the algorithm 3/13

x is close to that of f ! An Algebraic View It is instructive to view the Tug-of-War algorithm from linear algebra. Consider the matrix where . Let x f , we know that E . Our algorithm outputs x . The 2-norm of the vector Algorithms for Big Data (VI) 3/13 Assume that we run the algorithm k times (to apply the averaging trick), each time with function h i .

x is close to that of f ! An Algebraic View It is instructive to view the Tug-of-War algorithm from linear algebra. Let x f , we know that E . Our algorithm outputs x . The 2-norm of the vector Algorithms for Big Data (VI) 3/13 Assume that we run the algorithm k times (to apply the averaging trick), each time with function h i . Consider the matrix A = ( a ij ) i ∈ [ k ] , j ∈ [ n ] where a ij = h i ( j ) .

x is close to that of f ! An Algebraic View It is instructive to view the Tug-of-War algorithm from linear algebra. Algorithms for Big Data (VI) The 2-norm of the vector 3/13 Assume that we run the algorithm k times (to apply the averaging trick), each time with function h i . Consider the matrix A = ( a ij ) i ∈ [ k ] , j ∈ [ n ] where a ij = h i ( j ) . ∑ k i =1 x 2 ∥ x ∥ 2 x 2 = ∥ f ∥ 2 [ ] Let x = A f , we know that E i = 2 i 2 . Our algorithm outputs k k .

An Algebraic View It is instructive to view the Tug-of-War algorithm from linear algebra. Algorithms for Big Data (VI) x The 2-norm of the vector 3/13 Assume that we run the algorithm k times (to apply the averaging trick), each time with function h i . Consider the matrix A = ( a ij ) i ∈ [ k ] , j ∈ [ n ] where a ij = h i ( j ) . ∑ k i =1 x 2 ∥ x ∥ 2 x 2 = ∥ f ∥ 2 [ ] Let x = A f , we know that E i = 2 i 2 . Our algorithm outputs k k . √ k is close to that of f !

This operation is ofuen referred as dimension reduction or metric embedding. Dimension Reduction Suppose , what the matrix does is to map a vector in to a vector in without changing its norm much. The algorithm we met is similar to one important dimension reduction technique - Johnson-Lindenstrauss transformation. Algorithms for Big Data (VI) 4/13

Dimension Reduction without changing its norm much. The algorithm we met is similar to one important dimension reduction technique - Johnson-Lindenstrauss transformation. Algorithms for Big Data (VI) 4/13 Suppose k ≪ n , what the matrix A does is to map a vector in R n to a vector in R k This operation is ofuen referred as dimension reduction or metric embedding.

Johnson-Lindenstrauss transformation x Algorithms for Big Data (VI) independently. by drawing each of its entry from We construct y x y x y x y Theorem satisfying log where exists an matrix . There points , consider a set of and any positive integer For any 5/13

Johnson-Lindenstrauss transformation Theorem We construct by drawing each of its entry from independently. Algorithms for Big Data (VI) 5/13 For any 0 < ε < 1 2 and any positive integer m , consider a set of m points S ⊆ R n . There exists an matrix A ∈ R k × n where k = O ( ε − 2 log m ) satisfying (1 − ε ) ∥ x − y ∥ ≤ ∥ A x − A y ∥ ≤ (1 + ε ) ∥ x − y ∥ . ∀ x , y ∈ S ,

Johnson-Lindenstrauss transformation Theorem Algorithms for Big Data (VI) 5/13 For any 0 < ε < 1 2 and any positive integer m , consider a set of m points S ⊆ R n . There exists an matrix A ∈ R k × n where k = O ( ε − 2 log m ) satisfying (1 − ε ) ∥ x − y ∥ ≤ ∥ A x − A y ∥ ≤ (1 + ε ) ∥ x − y ∥ . ∀ x , y ∈ S , We construct A by drawing each of its entry from N (0 , 1 k ) independently.

Gaussian Distribution Recall the density function of a variable is The distribution function is d Assume and , then Algorithms for Big Data (VI) 6/13

The distribution function is Gaussian Distribution d Assume and , then Algorithms for Big Data (VI) 6/13 Recall the density function of a variable X ∼ N ( µ , σ 2 ) is e − ( x − µ )2 1 2 σ 2 . f X ( x ) = √ 2 πσ

Gaussian Distribution The distribution function is Algorithms for Big Data (VI) , then and Assume 6/13 Recall the density function of a variable X ∼ N ( µ , σ 2 ) is e − ( x − µ )2 1 2 σ 2 . f X ( x ) = √ 2 πσ ∫ x 1 e − ( t − µ )2 F X ( x ) = 2 σ 2 d t . √ 2 πσ −∞

Gaussian Distribution The distribution function is Algorithms for Big Data (VI) 6/13 Recall the density function of a variable X ∼ N ( µ , σ 2 ) is e − ( x − µ )2 1 2 σ 2 . f X ( x ) = √ 2 πσ ∫ x 1 e − ( t − µ )2 F X ( x ) = 2 σ 2 d t . √ 2 πσ −∞ Assume X 1 ∼ N ( µ 1 , σ 2 1 ) and X 2 ∼ N ( µ 2 , σ 2 2 ) , then aX 1 + bX 2 ∼ N ( a µ 1 + b µ 2 , a 2 σ 2 1 + b 2 σ 2 2 ) .

We need a concentration inequality for squared sum of Gaussians: Proof of JL The statement is equivalent to x y x y We only need to show that for every unit length vector f , Pr f Assume x f , then . Pr Algorithms for Big Data (VI) 7/13

We need a concentration inequality for squared sum of Gaussians: Proof of JL The statement is equivalent to We only need to show that for every unit length vector f , Pr f Assume x f , then . Pr Algorithms for Big Data (VI) 7/13 1 − ε ≤ ∥ A ( x − y ) ∥ ≤ 1 + ε . ∥ x − y ∥

We need a concentration inequality for squared sum of Gaussians: Proof of JL The statement is equivalent to We only need to show that for every unit length vector f , Assume x f , then . Pr Algorithms for Big Data (VI) 7/13 1 − ε ≤ ∥ A ( x − y ) ∥ ≤ 1 + ε . ∥ x − y ∥ Pr [ |∥ A f ∥ − 1 | > ε ] ≤ 1 − δ .

We need a concentration inequality for squared sum of Gaussians: Proof of JL The statement is equivalent to We only need to show that for every unit length vector f , Pr Algorithms for Big Data (VI) 7/13 1 − ε ≤ ∥ A ( x − y ) ∥ ≤ 1 + ε . ∥ x − y ∥ Pr [ |∥ A f ∥ − 1 | > ε ] ≤ 1 − δ . j ∈ [ n ] a ij · f j ∼ N (0 , 1 Assume x = A f , then x i = ∑ k ) .

Proof of JL Pr Algorithms for Big Data (VI) The statement is equivalent to 7/13 We only need to show that for every unit length vector f , 1 − ε ≤ ∥ A ( x − y ) ∥ ≤ 1 + ε . ∥ x − y ∥ Pr [ |∥ A f ∥ − 1 | > ε ] ≤ 1 − δ . j ∈ [ n ] a ij · f j ∼ N (0 , 1 Assume x = A f , then x i = ∑ k ) . We need a concentration inequality for squared sum of Gaussians: [� � ] k � � ∑ x 2 i − 1 ≤ 1 − δ . � ≥ ε � � � � � i =1

Concentration Theorem Assume be i.i.d , then for , Pr The proof is similar to the proof of the Chernofg bound we met before. Algorithms for Big Data (VI) 8/13

Concentration Theorem Algorithms for Big Data (VI) The proof is similar to the proof of the Chernofg bound we met before. 8/13 Pr Assume X 1 , X 2 , . . . , X k be i.i.d N (0 , 1) , then for 0 < ε < 1 , [� � k ] < 2 e − ε 2 k � � ∑ X 2 i − k � ≥ ε k � � 8 . � � � i =1

Algorithms for Big Data (VI) Chihao Zhang Shanghai Jiao Tong - PowerPoint PPT Presentation

Algorithms for Big Data (VI) Chihao Zhang Shanghai Jiao Tong University Oct. 25, 2019 Algorithms for Big Data (VI) 1/13 Review . Algorithms for Big Data (VI) . Output ; , On input from a -universal family; Pick log We learnt AMS

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

Algorithms for Big Data (X) Chihao Zhang Shanghai Jiao Tong University Nov. 22, 2019 Algorithms

Algorithms for Big Data (X) Chihao Zhang Shanghai Jiao Tong University Nov. 22, 2019 Algorithms

Big- Big -O O Analyzing Algorithms Asymptotically Analyzing Algorithms Asymptotically P1 P2

ANALYSIS OF ALGORITHMS AND BIG-O CS16: Introduction to Algorithms & Data Structures Tuesday,

Analysis of Algorithms & Big-O CS16: Introduction to Algorithms & Data Structures Spring

Algorithms Chapter 3 Chapter Summary Algorithms n Example Algorithms n Algorithmic Paradigms

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES & OPPORTUNITIES Paris Big Data

BIG DATA CONFERENCE How to transform data into money using Big Data technologies INTRO THE

BIG DATA: Revolutionizing construction business through socmed data mining REVOLUTIONIZING

Getting the Big (Data) Picture Eva Andreasson , Cloudera Big Data? Todays Big Data Landscape

Fundamentals of Big Data BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty Science

Big Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural

Newtons Law of Motion Physics 211 Syracuse University, Physics 211 Spring 2020 Walter

Emerging Automotive Trends: Chinese, Electric, & Shared 1 My Research Space Market

The North Slope: Opportunities & Challenges Scott Jepsen, Vice President External Affairs

Closed Multicategory of A -Categories Yu. Bespalov 1 , V. Lyubashenko 2 , O. Manzyuk 3 1

cse 311: foundations of computing Fall 2015 Lecture 22: Finite state machines review: finite

CS3000:&Algorithms&&&Data Jonathan&Ullman Lecture&7:&

UNLEARNING : The Challenge of Change Our Environment in which we Live &Work, Demands us

General Atari 2600 Game Playing Michael Bowling Work with: Joel Veness, Marc Bellemare, Anna

Algorithms for Big Data (VI) Chihao Zhang Shanghai Jiao Tong - PowerPoint PPT Presentation

Algorithms for Big Data (VI) Chihao Zhang Shanghai Jiao Tong University Oct. 25, 2019 Algorithms for Big Data (VI) 1/13 Review . Algorithms for Big Data (VI) . Output ; , On input from a -universal family; Pick log We learnt AMS

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

COMP9313: Big Data Management Introduction to Big Data Management What is big data? Tweeted by

Algorithms for Big Data (X) Chihao Zhang Shanghai Jiao Tong University Nov. 22, 2019 Algorithms

Algorithms for Big Data (X) Chihao Zhang Shanghai Jiao Tong University Nov. 22, 2019 Algorithms

Big- Big -O O Analyzing Algorithms Asymptotically Analyzing Algorithms Asymptotically P1 P2

ANALYSIS OF ALGORITHMS AND BIG-O CS16: Introduction to Algorithms &amp; Data Structures Tuesday,

Analysis of Algorithms &amp; Big-O CS16: Introduction to Algorithms &amp; Data Structures Spring

Algorithms Chapter 3 Chapter Summary Algorithms n Example Algorithms n Algorithmic Paradigms

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES &amp; OPPORTUNITIES Paris Big Data

BIG DATA CONFERENCE How to transform data into money using Big Data technologies INTRO THE

BIG DATA: Revolutionizing construction business through socmed data mining REVOLUTIONIZING

Getting the Big (Data) Picture Eva Andreasson , Cloudera Big Data? Todays Big Data Landscape

Fundamentals of Big Data BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty Science

Big Data Analytics: What is Big Data? Stony Brook University CSE545, Fall 2016 the inaugural

Newtons Law of Motion Physics 211 Syracuse University, Physics 211 Spring 2020 Walter

Emerging Automotive Trends: Chinese, Electric, &amp; Shared 1 My Research Space Market

The North Slope: Opportunities &amp; Challenges Scott Jepsen, Vice President External Affairs

Closed Multicategory of A -Categories Yu. Bespalov 1 , V. Lyubashenko 2 , O. Manzyuk 3 1

cse 311: foundations of computing Fall 2015 Lecture 22: Finite state machines review: finite

CS3000:&amp;Algorithms&amp;&amp;&amp;Data Jonathan&amp;Ullman Lecture&amp;7:&amp;

UNLEARNING : The Challenge of Change Our Environment in which we Live &amp;Work, Demands us

General Atari 2600 Game Playing Michael Bowling Work with: Joel Veness, Marc Bellemare, Anna

ANALYSIS OF ALGORITHMS AND BIG-O CS16: Introduction to Algorithms & Data Structures Tuesday,

Analysis of Algorithms & Big-O CS16: Introduction to Algorithms & Data Structures Spring

HOW BIG IS BIG DATA FOR AN INSURER LIKE AXA? CHALLENGES & OPPORTUNITIES Paris Big Data

Emerging Automotive Trends: Chinese, Electric, & Shared 1 My Research Space Market

The North Slope: Opportunities & Challenges Scott Jepsen, Vice President External Affairs

CS3000:&Algorithms&&&Data Jonathan&Ullman Lecture&7:&

UNLEARNING : The Challenge of Change Our Environment in which we Live &Work, Demands us