discrete geometry meets machine learning
play

Discrete Geometry meets Machine Learning Amitabh Basu Johns - PowerPoint PPT Presentation

Discrete Geometry meets Machine Learning Amitabh Basu Johns Hopkins University 22nd Combinatorial Opt. Workshop at Aussois January 11, 2018 Joint work with Anirbit Mukherjee, Raman Arora, Poorya Mianjy Two Problems in Discrete Geometry


  1. Discrete Geometry meets Machine Learning Amitabh Basu Johns Hopkins University 22nd Combinatorial Opt. Workshop at Aussois January 11, 2018 Joint work with Anirbit Mukherjee, Raman Arora, Poorya Mianjy

  2. Two Problems in Discrete Geometry Problem 1: Given two polytopes P and Q, do there exist simplices A 1 , …, A p and B 1 , …, B q such that P + A 1 + … + A p = Q + B 1 + … + B q

  3. Two Problems in Discrete Geometry Problem 1: Given two polytopes P and Q, do there exist simplices A 1 , …, A p and B 1 , …, B q such that P + A 1 + … + A p = Q + B 1 + … + B q Problem 2: For natural number k, define k-zonotope as the Minkowski sum of a finite set of polytopes, where each is a convex hull of k points [2-zonotope = regular zonotope] Given two 2 n -zonotopes P and Q, do there exist two 2 n+1 -zonotopes A and B such that conv(P U Q) + A = B

  4. What is a Deep Neural Network (DNN) ?

  5. What is a Deep Neural Network (DNN) ? • Directed Acyclic Graph (Network Architecture)

  6. What is a Deep Neural Network (DNN) ? • Directed Acyclic Graph (Network Architecture) • Weights on every edge and every vertex

  7. What is a Deep Neural Network (DNN) ? • Directed Acyclic Graph (Network Architecture) 2 . 45 • Weights on every edge and 2 every vertex 1 . 65 0 . 53 − 1 − 6 . 8 3

  8. What is a Deep Neural Network (DNN) ? • Directed Acyclic Graph (Network Architecture) 2 . 45 • Weights on every edge and 2 every vertex 1 . 65 0 . 53 • R -> R “Activation Function” Examples: f(x) = max{0,x} — Rectified − 1 Linear Unit (ReLU) f(x) = e x /(1 + e x ) — Sigmoid − 6 . 8 3

  9. What is a Deep Neural Network (DNN) ? • Directed Acyclic Graph (Network Architecture) 2 . 45 • Weights on every edge and 2 every vertex 1 . 65 0 . 53 • R -> R “Activation Function” Examples: f(x) = max{0,x} — Rectified − 1 Linear Unit (ReLU) f(x) = e x /(1 + e x ) — Sigmoid − 6 . 8 3 • Sources = input, Sinks = output

  10. What is a Deep Neural Network (DNN) ? • Directed Acyclic Graph (Network Architecture) 2 . 45 • Weights on every edge and 2 every vertex y 1 x 1 1 . 65 0 . 53 • R -> R “Activation Function” Examples: x 2 f(x) = max{0,x} — Rectified − 1 Linear Unit (ReLU) y 2 x 3 f(x) = e x /(1 + e x ) — Sigmoid − 6 . 8 3 • Sources = input, Sinks = output

  11. What is a Deep Neural Network (DNN) ? • Directed Acyclic Graph (Network Architecture) 2 . 45 • Weights on every edge and 2 every vertex u 1 y 1 x 1 a 1 1 . 65 0 . 53 u 2 a 2 • R -> R “Activation Function” o b Examples: x 2 a k f(x) = max{0,x} — Rectified u k − 1 Linear Unit (ReLU) o = f ( a 1 u 1 + a 2 u 2 + . . . + a k u k + b ) y 2 x 3 f(x) = e x /(1 + e x ) — Sigmoid − 6 . 8 3 • Sources = input, Sinks = output

  12. What is a Deep Neural Network (DNN) ? • Directed Acyclic Graph (Network Architecture) 2 . 45 • Weights on every edge and 2 every vertex u 1 y 1 x 1 a 1 1 . 65 0 . 53 u 2 a 2 • R -> R “Activation Function” o b Examples: x 2 a k f(x) = max{0,x} — Rectified u k − 1 Linear Unit (ReLU) o = max { 0 , a 1 u 1 + a 2 u 2 + . . . + a k u k + b } y 2 x 3 f(x) = e x /(1 + e x ) — Sigmoid − 6 . 8 3 • Sources = input, Sinks = output

  13. Problems of interest for DNNs • Expressiveness: What family of functions can one represent using DNNs? • Efficiency: How many layers (depth) and vertices (size) needed to represent functions in the family? • Training the network: Given architecture, data points (x,y), find weights for the ``best fit” function. • Generalization error: Rademacher complexity, VC dimension

  14. Problems of interest for DNNs • Expressiveness: What family of functions can one represent using DNNs? • Efficiency: How many layers (depth) and vertices (size) needed to represent functions in the family? • Training the network: Given architecture, data points (x,y), find weights for the ``best fit” function. • Generalization error: Rademacher complexity, VC dimension

  15. Calculus of DNN functions

  16. Calculus of DNN functions • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 + f 2 in DNN(max{k 1 ,k 2 }, s 1 +s 2 )

  17. Calculus of DNN functions • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 + f 2 in DNN(max{k 1 ,k 2 }, s 1 +s 2 ) f 1 x y = f 1 + f 2 f 2

  18. Calculus of DNN functions • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 + f 2 in DNN(max{k 1 ,k 2 }, s 1 +s 2 ) • f in DNN(k,s), c in R => cf in DNN(k,s)

  19. Calculus of DNN functions • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 + f 2 in DNN(max{k 1 ,k 2 }, s 1 +s 2 ) • f in DNN(k,s), c in R => cf in DNN(k,s) • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 o f 2 in DNN(k 1 +k 2 , s 1 +s 2 )

  20. Calculus of DNN functions • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 + f 2 in DNN(max{k 1 ,k 2 }, s 1 +s 2 ) • f in DNN(k,s), c in R => cf in DNN(k,s) f 2 f 1 x f 1 � f 2 • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 o f 2 in DNN(k 1 +k 2 , s 1 +s 2 )

  21. Calculus of DNN functions • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 + f 2 in DNN(max{k 1 ,k 2 }, s 1 +s 2 ) • f in DNN(k,s), c in R => cf in DNN(k,s) • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 o f 2 in DNN(k 1 +k 2 , s 1 +s 2 ) • f 1 in ReLU-DNN(k 1 ,s 1 ), f 2 in ReLU-DNN(k 2 ,s 2 ) => max{f 1 , f 2 } in ReLU-DNN(max{k 1 ,k 2 }+1, s 1 +s 2 +4)

  22. Calculus of DNN functions • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 + f 2 in DNN(max{k 1 ,k 2 }, s 1 +s 2 ) F : R n ! R 2 , F ( x ) = ( f 1 ( x ) , f 2 ( x )) • f in DNN(k,s), c in R => cf in DNN(k,s) G : R 2 ! R , G ( z 1 , z 2 ) = max { z 1 , z 2 } max { f 1 , f 2 } = G � F • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 o f 2 in DNN(k 1 +k 2 , s 1 +s 2 ) • f 1 in ReLU-DNN(k 1 ,s 1 ), f 2 in ReLU-DNN(k 2 ,s 2 ) => max{f 1 , f 2 } in ReLU-DNN(max{k 1 ,k 2 }+1, s 1 +s 2 +4)

  23. Calculus of DNN functions • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 + f 2 in DNN(max{k 1 ,k 2 }, s 1 +s 2 ) G : R 2 → R , G ( z 1 , z 2 ) = max { z 1 , z 2 } • f in DNN(k,s), c in R => cf in DNN(k,s) + | z 1 − z 2 | max { z 1 , z 2 } = z 1 + z 2 2 2 • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 o f 2 in DNN(k 1 +k 2 , s 1 +s 2 ) • f 1 in ReLU-DNN(k 1 ,s 1 ), f 2 in ReLU-DNN(k 2 ,s 2 ) => max{f 1 , f 2 } in ReLU-DNN(max{k 1 ,k 2 }+1, s 1 +s 2 +4)

  24. Calculus of DNN functions • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 + f 2 in DNN(max{k 1 ,k 2 }, s 1 +s 2 ) 1 1 1 2 Input x 1 -1 − 1 • f in DNN(k,s), c in R => cf in DNN(k,s) 2 -1 + | x 1 − x 2 | x 1 + x 2 2 2 -1 1 2 Input x 2 1 1 1 2 -1 • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 o f 2 in DNN(k 1 +k 2 , s 1 +s 2 ) • f 1 in ReLU-DNN(k 1 ,s 1 ), f 2 in ReLU-DNN(k 2 ,s 2 ) => max{f 1 , f 2 } in ReLU-DNN(max{k 1 ,k 2 }+1, s 1 +s 2 +4)

  25. Calculus of DNN functions • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 + f 2 in DNN(max{k 1 ,k 2 }, s 1 +s 2 ) • f in DNN(k,s), c in R => cf in DNN(k,s) • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 o f 2 in DNN(k 1 +k 2 , s 1 +s 2 ) • f 1 in ReLU-DNN(k 1 ,s 1 ), f 2 in ReLU-DNN(k 2 ,s 2 ) => max{f 1 , f 2 } in ReLU-DNN(max{k 1 ,k 2 }+1, s 1 +s 2 +4) • Affine functions can be implemented in ReLU-DNN(1,2n)

  26. Problems of interest for DNNs • Expressiveness: What family of functions can one represent using DNNs? • Efficiency: How many layers (depth) and vertices (size) needed to represent functions in the family? • Training the network: Given architecture, data points (x,y), find weights for the ``best fit” function. • Generalization error: Rademacher complexity, VC dimension

  27. Expressiveness of ReLU DNNs Theorem (Arora, Basu, Mianjy, Mukherjee 2016): Any ReLU DNN with ’n’ inputs implements a continuous piecewise affine function on R n . Conversely, any continuous piecewise affine function on R n can be implemented by some ReLU DNN. Moreover, at most log(n+1) hidden layers are needed.

  28. Expressiveness of ReLU DNNs Theorem (Arora, Basu, Mianjy, Mukherjee 2016): Any ReLU DNN with ’n’ inputs implements a continuous piecewise affine function on R n . Conversely, any continuous piecewise affine function on R n can be implemented by some ReLU DNN. Moreover, at most log(n+1) hidden layers are needed. Proof: Tropical Geometry result [Ovchinnikov 2002] says any continuous piecewise affine function can be written as max i=1, …, k min j in Si {l j }

  29. Expressiveness of ReLU DNNs Theorem (Arora, Basu, Mianjy, Mukherjee 2016): Any ReLU DNN with ’n’ inputs implements a continuous piecewise affine function on R n . Conversely, any continuous piecewise affine function on R n can be implemented by some ReLU DNN. Moreover, at most log(n+1) hidden layers are needed. Proof: Tropical Geometry result [Ovchinnikov 2002] says any continuous piecewise affine function can be written as max i=1, …, k min j in Si {l j }

Recommend


More recommend