Algorithms for Big Data (X) Chihao Zhang Shanghai Jiao Tong University Nov. 22, 2019 Algorithms for Big Data (X) 1/10
Today we will introduce a Monte-Carlo algorithm to approximate Matrix Multiplication . Algorithms for Big Data (X) . . where The best algorithm so far costs The Strassen’s algorithm reduces the cost to Given two matrices multiplication operations. , the naive algorithm costs For . , we computes and 2/10
Today we will introduce a Monte-Carlo algorithm to approximate Matrix Multiplication For , the naive algorithm costs multiplication operations. The Strassen’s algorithm reduces the cost to . The best algorithm so far costs where . . Algorithms for Big Data (X) 2/10 Given two matrices A ∈ R m × n and B ∈ R n × p , we computes C = AB .
Today we will introduce a Monte-Carlo algorithm to approximate Matrix Multiplication The Strassen’s algorithm reduces the cost to . The best algorithm so far costs where . . Algorithms for Big Data (X) 2/10 Given two matrices A ∈ R m × n and B ∈ R n × p , we computes C = AB . For m = n = p , the naive algorithm costs O ( n 3 ) multiplication operations.
Today we will introduce a Monte-Carlo algorithm to approximate Matrix Multiplication The best algorithm so far costs where . . Algorithms for Big Data (X) 2/10 Given two matrices A ∈ R m × n and B ∈ R n × p , we computes C = AB . For m = n = p , the naive algorithm costs O ( n 3 ) multiplication operations. The Strassen’s algorithm reduces the cost to O ( n 2.81 ) .
Today we will introduce a Monte-Carlo algorithm to approximate Matrix Multiplication . Algorithms for Big Data (X) 2/10 Given two matrices A ∈ R m × n and B ∈ R n × p , we computes C = AB . For m = n = p , the naive algorithm costs O ( n 3 ) multiplication operations. The Strassen’s algorithm reduces the cost to O ( n 2.81 ) . The best algorithm so far costs O ( n ω ) where ω < 2.3728639 .
Algorithms for Big Data (X) Matrix Multiplication 2/10 Given two matrices A ∈ R m × n and B ∈ R n × p , we computes C = AB . For m = n = p , the naive algorithm costs O ( n 3 ) multiplication operations. The Strassen’s algorithm reduces the cost to O ( n 2.81 ) . The best algorithm so far costs O ( n ω ) where ω < 2.3728639 . Today we will introduce a Monte-Carlo algorithm to approximate AB .
The Frobenius norm of a matrix Review of Linear Algebra b Algorithms for Big Data (X) is a b , where each a b is of rank . Then . . Assume . . b and a a 3/10
The Frobenius norm of a matrix Review of Linear Algebra . Algorithms for Big Data (X) is a b , where each a b is of rank . Then . . 3/10 b T 1 � � Assume A = and B = a 1 , . . . , a n . b T n
The Frobenius norm of a matrix Review of Linear Algebra . Algorithms for Big Data (X) is . . 3/10 b T 1 � � Assume A = and B = a 1 , . . . , a n . b T n Then AB = ∑ n i = 1 a i b T i , where each a i b T i is of rank 1 .
Review of Linear Algebra . Algorithms for Big Data (X) . . 3/10 b T 1 � � Assume A = and B = a 1 , . . . , a n . b T n Then AB = ∑ n i = 1 a i b T i , where each a i b T i is of rank 1 . The Frobenius norm of a matrix A = ( a ij ) 1 ≤ i ≤ m,1 ≤ j ≤ n is � m n � ∑ ∑ ∥ A ∥ F ≜ � a 2 ij . � i = 1 j = 1
The Algorithm Note that a b . The algorithm randomly pick indices independently times (with replacement). Let denote the indices. Output a b , where is some weight to be determined. Algorithms for Big Data (X) 4/10
The Algorithm The algorithm randomly pick indices independently times (with replacement). Let denote the indices. Output a b , where is some weight to be determined. Algorithms for Big Data (X) 4/10 Note that AB = ∑ n i = 1 a i b T i .
The Algorithm Let denote the indices. Output a b , where is some weight to be determined. Algorithms for Big Data (X) 4/10 Note that AB = ∑ n i = 1 a i b T i . The algorithm randomly pick indices i ∈ [ n ] independently c times (with replacement).
Output The Algorithm a b , where is some weight to be determined. Algorithms for Big Data (X) 4/10 Note that AB = ∑ n i = 1 a i b T i . The algorithm randomly pick indices i ∈ [ n ] independently c times (with replacement). Let J : [ c ] → [ n ] denote the indices.
The Algorithm Algorithms for Big Data (X) 4/10 Note that AB = ∑ n i = 1 a i b T i . The algorithm randomly pick indices i ∈ [ n ] independently c times (with replacement). Let J : [ c ] → [ n ] denote the indices. Output ∑ c i = 1 w ( J ( i )) · a J ( i ) b T J ( i ) , where w ( J ( i )) is some weight to be determined.
It is convenient to formulate the algorithm using matrices. Define a random sampling 5/10 otherwise Algorithms for Big Data (X) and where Then our algorithm outputs . such that if matrix . times in expectation, so we can set is picked Therefore, the index We fix a distribution on [ n ] ( p i for i ∈ [ n ] satisfying ∑ i ∈ [ n ] p i = 1 ).
It is convenient to formulate the algorithm using matrices. Define a random sampling matrix such that if otherwise . Then our algorithm outputs where and Algorithms for Big Data (X) 5/10 We fix a distribution on [ n ] ( p i for i ∈ [ n ] satisfying ∑ i ∈ [ n ] p i = 1 ). Therefore, the index j is picked c · p j times in expectation, so we can set w ( j ) = ( cp j ) − 1 .
5/10 otherwise Algorithms for Big Data (X) and where Then our algorithm outputs . We fix a distribution on [ n ] ( p i for i ∈ [ n ] satisfying ∑ i ∈ [ n ] p i = 1 ). Therefore, the index j is picked c · p j times in expectation, so we can set w ( j ) = ( cp j ) − 1 . It is convenient to formulate the algorithm using matrices. Define a random sampling matrix Π = ( π ij ) ∈ R c × c such that { ( cp i ) − 1 if i = J ( j ) 2 π ij = 0
5/10 otherwise Algorithms for Big Data (X) . We fix a distribution on [ n ] ( p i for i ∈ [ n ] satisfying ∑ i ∈ [ n ] p i = 1 ). Therefore, the index j is picked c · p j times in expectation, so we can set w ( j ) = ( cp j ) − 1 . It is convenient to formulate the algorithm using matrices. Define a random sampling matrix Π = ( π ij ) ∈ R c × c such that { ( cp i ) − 1 if i = J ( j ) 2 π ij = 0 Then our algorithm outputs A ′ B ′ where A ′ = AΠ and B ′ = Π T B.
Analysis . Algorithms for Big Data (X) Var a b E a b E b We are going to choose some a , we let for any Fix . so that 6/10
Analysis Fix for any , we let a b . E a b E a b Var Algorithms for Big Data (X) 6/10 We are going to choose some ( p i ) i ∈ [ n ] so that A ′ B ′ ≈ AB .
Analysis . Algorithms for Big Data (X) Var a b E a b E 6/10 We are going to choose some ( p i ) i ∈ [ n ] so that A ′ B ′ ≈ AB . � � a J ( k ) b T J ( k ) Fix i, j for any k ∈ [ c ] , we let X k = cp J ( k ) ij
Analysis . Algorithms for Big Data (X) E 6/10 We are going to choose some ( p i ) i ∈ [ n ] so that A ′ B ′ ≈ AB . � � a J ( k ) b T J ( k ) Fix i, j for any k ∈ [ c ] , we let X k = cp J ( k ) ij n � a ℓ b T ∑ � = 1 ℓ E [ X k ] = c ( AB ) ij p ℓ cp ℓ ij ℓ = 1 � 2 n n a 2 ℓi b 2 � a ℓ b T ∑ ∑ � � ℓj X 2 ℓ = p ℓ = k c 2 p ℓ cp ℓ ij ℓ = 1 ℓ = 1 n a 2 ℓi b 2 ∑ − 1 ℓj c 2 ( AB ) 2 Var [ X k ] = ij . c 2 p ℓ ℓ = 1
Therefore, We are going to study the concentration of this algorithm. Algorithms for Big Data (X) b a Var E E We compute that 7/10 E c ∑ ( A ′ B ′ ) ij � � = E [ X k ] = ( AB ) ij . k = 1
Therefore, We are going to study the concentration of this algorithm. Algorithms for Big Data (X) b a Var E E We compute that 7/10 E c ∑ ( A ′ B ′ ) ij � � = E [ X k ] = ( AB ) ij . k = 1
Therefore, We are going to study the concentration of this algorithm. Algorithms for Big Data (X) Var E E E We compute that 7/10 c ∑ ( A ′ B ′ ) ij � � = E [ X k ] = ( AB ) ij . k = 1 n p ∑ ∑ � � � � ∥ AB − A ′ B ′ ∥ 2 ( AB − A ′ B ′ ) 2 = F ij i = 1 j = 1 n p ∑ ∑ � ( A ′ B ′ ) ij � = i = 1 j = 1 � n � = 1 ∑ 1 ∥ a ℓ ∥ 2 ∥ b ℓ ∥ 2 − ∥ AB ∥ 2 F c p ℓ ℓ = 1
8/10 E Algorithms for Big Data (X) If we choose p ℓ ∼ ∥ a ℓ ∥∥ b ℓ ∥ , then � n � 2 ∑ = 1 � � ∥ AB − A ′ B ′ ∥ 2 − ∥ AB ∥ 2 ∥ a ℓ ∥∥ b ℓ ∥ F F c ℓ = 1 � n � 2 ∑ ≤ 1 ∥ a ℓ ∥∥ b ℓ ∥ c ℓ = 1 ≤ 1 c ∥ A ∥ 2 F ∥ B ∥ 2 F .
We can use a variant of median trick to boost the algorithm. Therefore, by Chebyshev’s inequality, Pr Algorithms for Big Data (X) probability of correctness. to achieve log We can choose 9/10 1 � � � ∥ AB − A ′ B ′ ∥ F > ε ∥ A ∥ F ∥ B ∥ F � ∥ AB − A ′ B ′ ∥ 2 F > ε 2 ∥ A ∥ 2 F ∥ B ∥ 2 = Pr ≤ cε 2 . F
Therefore, by Chebyshev’s inequality, Pr Algorithms for Big Data (X) probability of correctness. to achieve log We can choose 9/10 1 � � � ∥ AB − A ′ B ′ ∥ F > ε ∥ A ∥ F ∥ B ∥ F � ∥ AB − A ′ B ′ ∥ 2 F > ε 2 ∥ A ∥ 2 F ∥ B ∥ 2 = Pr ≤ cε 2 . F We can use a variant of median trick to boost the algorithm.
Recommend
More recommend