Discrete Geometry meets Machine Learning Amitabh Basu Johns Hopkins University 22nd Combinatorial Opt. Workshop at Aussois January 11, 2018 Joint work with Anirbit Mukherjee, Raman Arora, Poorya Mianjy
Two Problems in Discrete Geometry Problem 1: Given two polytopes P and Q, do there exist simplices A 1 , …, A p and B 1 , …, B q such that P + A 1 + … + A p = Q + B 1 + … + B q
Two Problems in Discrete Geometry Problem 1: Given two polytopes P and Q, do there exist simplices A 1 , …, A p and B 1 , …, B q such that P + A 1 + … + A p = Q + B 1 + … + B q Problem 2: For natural number k, define k-zonotope as the Minkowski sum of a finite set of polytopes, where each is a convex hull of k points [2-zonotope = regular zonotope] Given two 2 n -zonotopes P and Q, do there exist two 2 n+1 -zonotopes A and B such that conv(P U Q) + A = B
What is a Deep Neural Network (DNN) ?
What is a Deep Neural Network (DNN) ? • Directed Acyclic Graph (Network Architecture)
What is a Deep Neural Network (DNN) ? • Directed Acyclic Graph (Network Architecture) • Weights on every edge and every vertex
What is a Deep Neural Network (DNN) ? • Directed Acyclic Graph (Network Architecture) 2 . 45 • Weights on every edge and 2 every vertex 1 . 65 0 . 53 − 1 − 6 . 8 3
What is a Deep Neural Network (DNN) ? • Directed Acyclic Graph (Network Architecture) 2 . 45 • Weights on every edge and 2 every vertex 1 . 65 0 . 53 • R -> R “Activation Function” Examples: f(x) = max{0,x} — Rectified − 1 Linear Unit (ReLU) f(x) = e x /(1 + e x ) — Sigmoid − 6 . 8 3
What is a Deep Neural Network (DNN) ? • Directed Acyclic Graph (Network Architecture) 2 . 45 • Weights on every edge and 2 every vertex 1 . 65 0 . 53 • R -> R “Activation Function” Examples: f(x) = max{0,x} — Rectified − 1 Linear Unit (ReLU) f(x) = e x /(1 + e x ) — Sigmoid − 6 . 8 3 • Sources = input, Sinks = output
What is a Deep Neural Network (DNN) ? • Directed Acyclic Graph (Network Architecture) 2 . 45 • Weights on every edge and 2 every vertex y 1 x 1 1 . 65 0 . 53 • R -> R “Activation Function” Examples: x 2 f(x) = max{0,x} — Rectified − 1 Linear Unit (ReLU) y 2 x 3 f(x) = e x /(1 + e x ) — Sigmoid − 6 . 8 3 • Sources = input, Sinks = output
What is a Deep Neural Network (DNN) ? • Directed Acyclic Graph (Network Architecture) 2 . 45 • Weights on every edge and 2 every vertex u 1 y 1 x 1 a 1 1 . 65 0 . 53 u 2 a 2 • R -> R “Activation Function” o b Examples: x 2 a k f(x) = max{0,x} — Rectified u k − 1 Linear Unit (ReLU) o = f ( a 1 u 1 + a 2 u 2 + . . . + a k u k + b ) y 2 x 3 f(x) = e x /(1 + e x ) — Sigmoid − 6 . 8 3 • Sources = input, Sinks = output
What is a Deep Neural Network (DNN) ? • Directed Acyclic Graph (Network Architecture) 2 . 45 • Weights on every edge and 2 every vertex u 1 y 1 x 1 a 1 1 . 65 0 . 53 u 2 a 2 • R -> R “Activation Function” o b Examples: x 2 a k f(x) = max{0,x} — Rectified u k − 1 Linear Unit (ReLU) o = max { 0 , a 1 u 1 + a 2 u 2 + . . . + a k u k + b } y 2 x 3 f(x) = e x /(1 + e x ) — Sigmoid − 6 . 8 3 • Sources = input, Sinks = output
Problems of interest for DNNs • Expressiveness: What family of functions can one represent using DNNs? • Efficiency: How many layers (depth) and vertices (size) needed to represent functions in the family? • Training the network: Given architecture, data points (x,y), find weights for the ``best fit” function. • Generalization error: Rademacher complexity, VC dimension
Problems of interest for DNNs • Expressiveness: What family of functions can one represent using DNNs? • Efficiency: How many layers (depth) and vertices (size) needed to represent functions in the family? • Training the network: Given architecture, data points (x,y), find weights for the ``best fit” function. • Generalization error: Rademacher complexity, VC dimension
Calculus of DNN functions
Calculus of DNN functions • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 + f 2 in DNN(max{k 1 ,k 2 }, s 1 +s 2 )
Calculus of DNN functions • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 + f 2 in DNN(max{k 1 ,k 2 }, s 1 +s 2 ) f 1 x y = f 1 + f 2 f 2
Calculus of DNN functions • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 + f 2 in DNN(max{k 1 ,k 2 }, s 1 +s 2 ) • f in DNN(k,s), c in R => cf in DNN(k,s)
Calculus of DNN functions • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 + f 2 in DNN(max{k 1 ,k 2 }, s 1 +s 2 ) • f in DNN(k,s), c in R => cf in DNN(k,s) • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 o f 2 in DNN(k 1 +k 2 , s 1 +s 2 )
Calculus of DNN functions • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 + f 2 in DNN(max{k 1 ,k 2 }, s 1 +s 2 ) • f in DNN(k,s), c in R => cf in DNN(k,s) f 2 f 1 x f 1 � f 2 • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 o f 2 in DNN(k 1 +k 2 , s 1 +s 2 )
Calculus of DNN functions • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 + f 2 in DNN(max{k 1 ,k 2 }, s 1 +s 2 ) • f in DNN(k,s), c in R => cf in DNN(k,s) • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 o f 2 in DNN(k 1 +k 2 , s 1 +s 2 ) • f 1 in ReLU-DNN(k 1 ,s 1 ), f 2 in ReLU-DNN(k 2 ,s 2 ) => max{f 1 , f 2 } in ReLU-DNN(max{k 1 ,k 2 }+1, s 1 +s 2 +4)
Calculus of DNN functions • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 + f 2 in DNN(max{k 1 ,k 2 }, s 1 +s 2 ) F : R n ! R 2 , F ( x ) = ( f 1 ( x ) , f 2 ( x )) • f in DNN(k,s), c in R => cf in DNN(k,s) G : R 2 ! R , G ( z 1 , z 2 ) = max { z 1 , z 2 } max { f 1 , f 2 } = G � F • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 o f 2 in DNN(k 1 +k 2 , s 1 +s 2 ) • f 1 in ReLU-DNN(k 1 ,s 1 ), f 2 in ReLU-DNN(k 2 ,s 2 ) => max{f 1 , f 2 } in ReLU-DNN(max{k 1 ,k 2 }+1, s 1 +s 2 +4)
Calculus of DNN functions • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 + f 2 in DNN(max{k 1 ,k 2 }, s 1 +s 2 ) G : R 2 → R , G ( z 1 , z 2 ) = max { z 1 , z 2 } • f in DNN(k,s), c in R => cf in DNN(k,s) + | z 1 − z 2 | max { z 1 , z 2 } = z 1 + z 2 2 2 • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 o f 2 in DNN(k 1 +k 2 , s 1 +s 2 ) • f 1 in ReLU-DNN(k 1 ,s 1 ), f 2 in ReLU-DNN(k 2 ,s 2 ) => max{f 1 , f 2 } in ReLU-DNN(max{k 1 ,k 2 }+1, s 1 +s 2 +4)
Calculus of DNN functions • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 + f 2 in DNN(max{k 1 ,k 2 }, s 1 +s 2 ) 1 1 1 2 Input x 1 -1 − 1 • f in DNN(k,s), c in R => cf in DNN(k,s) 2 -1 + | x 1 − x 2 | x 1 + x 2 2 2 -1 1 2 Input x 2 1 1 1 2 -1 • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 o f 2 in DNN(k 1 +k 2 , s 1 +s 2 ) • f 1 in ReLU-DNN(k 1 ,s 1 ), f 2 in ReLU-DNN(k 2 ,s 2 ) => max{f 1 , f 2 } in ReLU-DNN(max{k 1 ,k 2 }+1, s 1 +s 2 +4)
Calculus of DNN functions • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 + f 2 in DNN(max{k 1 ,k 2 }, s 1 +s 2 ) • f in DNN(k,s), c in R => cf in DNN(k,s) • f 1 in DNN(k 1 ,s 1 ), f 2 in DNN(k 2 ,s 2 ) => f 1 o f 2 in DNN(k 1 +k 2 , s 1 +s 2 ) • f 1 in ReLU-DNN(k 1 ,s 1 ), f 2 in ReLU-DNN(k 2 ,s 2 ) => max{f 1 , f 2 } in ReLU-DNN(max{k 1 ,k 2 }+1, s 1 +s 2 +4) • Affine functions can be implemented in ReLU-DNN(1,2n)
Problems of interest for DNNs • Expressiveness: What family of functions can one represent using DNNs? • Efficiency: How many layers (depth) and vertices (size) needed to represent functions in the family? • Training the network: Given architecture, data points (x,y), find weights for the ``best fit” function. • Generalization error: Rademacher complexity, VC dimension
Expressiveness of ReLU DNNs Theorem (Arora, Basu, Mianjy, Mukherjee 2016): Any ReLU DNN with ’n’ inputs implements a continuous piecewise affine function on R n . Conversely, any continuous piecewise affine function on R n can be implemented by some ReLU DNN. Moreover, at most log(n+1) hidden layers are needed.
Expressiveness of ReLU DNNs Theorem (Arora, Basu, Mianjy, Mukherjee 2016): Any ReLU DNN with ’n’ inputs implements a continuous piecewise affine function on R n . Conversely, any continuous piecewise affine function on R n can be implemented by some ReLU DNN. Moreover, at most log(n+1) hidden layers are needed. Proof: Tropical Geometry result [Ovchinnikov 2002] says any continuous piecewise affine function can be written as max i=1, …, k min j in Si {l j }
Expressiveness of ReLU DNNs Theorem (Arora, Basu, Mianjy, Mukherjee 2016): Any ReLU DNN with ’n’ inputs implements a continuous piecewise affine function on R n . Conversely, any continuous piecewise affine function on R n can be implemented by some ReLU DNN. Moreover, at most log(n+1) hidden layers are needed. Proof: Tropical Geometry result [Ovchinnikov 2002] says any continuous piecewise affine function can be written as max i=1, …, k min j in Si {l j }
Recommend
More recommend