ammi introduction to deep learning 4 1 dag networks
play

AMMI Introduction to Deep Learning 4.1. DAG networks Fran cois - PowerPoint PPT Presentation

AMMI Introduction to Deep Learning 4.1. DAG networks Fran cois Fleuret https://fleuret.org/ammi-2018/ Wed Aug 29 16:57:27 CAT 2018 COLE POLYTECHNIQUE FDRALE DE LAUSANNE Everything we have seen for an MLP w (1) b (1) w (2) b (2)


  1. AMMI – Introduction to Deep Learning 4.1. DAG networks Fran¸ cois Fleuret https://fleuret.org/ammi-2018/ Wed Aug 29 16:57:27 CAT 2018 ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE

  2. Everything we have seen for an MLP w (1) b (1) w (2) b (2) x + σ + σ f ( x ) × × Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 1 / 11

  3. Everything we have seen for an MLP w (1) b (1) w (2) b (2) x + σ + σ f ( x ) × × can be generalized to an arbitrary “Directed Acyclic Graph” (DAG) of operators w (1) φ (1) φ (3) f ( x ) φ (2) x w (2) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 1 / 11

  4. Remember that we use tensorial notation. If ( a 1 , . . . , a Q ) = φ ( b 1 , . . . , b R ), we have  ∂ a 1 ∂ a 1  . . . � ∂ a ∂ b 1 ∂ b R � . .  ...  . . = J φ =  .   . .   ∂ b  ∂ a Q ∂ a Q . . . ∂ b 1 ∂ b R This notation does not specify at which point this is computed. It will always be for the forward-pass activations. Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 2 / 11

  5. Remember that we use tensorial notation. If ( a 1 , . . . , a Q ) = φ ( b 1 , . . . , b R ), we have  ∂ a 1 ∂ a 1  . . . � ∂ a ∂ b 1 ∂ b R � . .  ...  . . = J φ =  .   . .   ∂ b  ∂ a Q ∂ a Q . . . ∂ b 1 ∂ b R This notation does not specify at which point this is computed. It will always be for the forward-pass activations. Also, if ( a 1 , . . . , a Q ) = φ ( b 1 , . . . , b R , c 1 , . . . , c S ), we use  ∂ a 1 ∂ a 1  . . . � ∂ a ∂ c 1 ∂ c S � . .  ...  . . = J φ | c =  .   . . ∂ c    ∂ a Q ∂ a Q . . . ∂ c 1 ∂ c S Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 2 / 11

  6. Forward pass w (1) φ (1) φ (3) f ( x ) = x (3) x (1) x (0) = x φ (2) x (2) w (2) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 3 / 11

  7. Forward pass w (1) φ (1) φ (3) f ( x ) = x (3) x (1) x (0) = x φ (2) x (2) w (2) x (0) = x Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 3 / 11

  8. Forward pass w (1) φ (1) φ (3) f ( x ) = x (3) x (1) x (0) = x φ (2) x (2) w (2) x (0) = x x (1) = φ (1) ( x (0) ; w (1) ) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 3 / 11

  9. Forward pass w (1) φ (1) φ (3) f ( x ) = x (3) x (1) x (0) = x φ (2) x (2) w (2) x (0) = x x (1) = φ (1) ( x (0) ; w (1) ) x (2) = φ (2) ( x (0) , x (1) ; w (2) ) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 3 / 11

  10. Forward pass w (1) φ (1) φ (3) f ( x ) = x (3) x (1) x (0) = x φ (2) x (2) w (2) x (0) = x x (1) = φ (1) ( x (0) ; w (1) ) x (2) = φ (2) ( x (0) , x (1) ; w (2) ) f ( x ) = x (3) = φ (3) ( x (1) , x (2) ; w (1) ) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 3 / 11

  11. Backward pass, derivatives w.r.t activations w (1) φ (1) φ (3) f ( x ) = x (3) x (1) φ (2) x (0) = x x (2) w (2) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 4 / 11

  12. Backward pass, derivatives w.r.t activations w (1) φ (1) φ (3) f ( x ) = x (3) x (1) φ (2) x (0) = x x (2) w (2) � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ x (3) � � � = = J φ (3) | x (2) ∂ x (2) ∂ x (2) ∂ x (3) ∂ x (3) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 4 / 11

  13. Backward pass, derivatives w.r.t activations w (1) φ (1) φ (3) f ( x ) = x (3) x (1) φ (2) x (0) = x x (2) w (2) � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ x (3) � � � = = J φ (3) | x (2) ∂ x (2) ∂ x (2) ∂ x (3) ∂ x (3) � ∂ 퓁 � � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ 퓁 � � ∂ x (2) ∂ x (3) � � � � � = + = J φ (2) | x (1) + J φ (3) | x (1) ∂ x (1) ∂ x (1) ∂ x (2) ∂ x (1) ∂ x (3) ∂ x (2) ∂ x (3) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 4 / 11

  14. Backward pass, derivatives w.r.t activations w (1) φ (1) φ (3) f ( x ) = x (3) x (1) φ (2) x (0) = x x (2) w (2) � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ x (3) � � � = = J φ (3) | x (2) ∂ x (2) ∂ x (2) ∂ x (3) ∂ x (3) � ∂ 퓁 � � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ 퓁 � � ∂ x (2) ∂ x (3) � � � � � = + = J φ (2) | x (1) + J φ (3) | x (1) ∂ x (1) ∂ x (1) ∂ x (2) ∂ x (1) ∂ x (3) ∂ x (2) ∂ x (3) � ∂ 퓁 � � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ 퓁 � � ∂ x (1) ∂ x (2) � � � � � = + = J φ (1) | x (0) + J φ (2) | x (0) ∂ x (0) ∂ x (0) ∂ x (1) ∂ x (0) ∂ x (2) ∂ x (1) ∂ x (2) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 4 / 11

  15. Backward pass, derivatives w.r.t parameters w (1) φ (1) x (1) φ (3) f ( x ) = x (3) φ (2) x (0) = x x (2) w (2) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 5 / 11

  16. Backward pass, derivatives w.r.t parameters w (1) φ (1) x (1) φ (3) f ( x ) = x (3) φ (2) x (0) = x x (2) w (2) � ∂ 퓁 � � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ 퓁 � � ∂ x (1) ∂ x (3) � � � � � = + = J φ (1) | w (1) + J φ (3) | w (1) ∂ w (1) ∂ w (1) ∂ x (1) ∂ w (1) ∂ x (3) ∂ x (1) ∂ x (3) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 5 / 11

  17. Backward pass, derivatives w.r.t parameters w (1) φ (1) x (1) φ (3) f ( x ) = x (3) φ (2) x (0) = x x (2) w (2) � ∂ 퓁 � � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ 퓁 � � ∂ x (1) ∂ x (3) � � � � � = + = J φ (1) | w (1) + J φ (3) | w (1) ∂ w (1) ∂ w (1) ∂ x (1) ∂ w (1) ∂ x (3) ∂ x (1) ∂ x (3) � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ x (2) � � � = = J φ (2) | w (2) ∂ w (2) ∂ w (2) ∂ x (2) ∂ x (2) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 5 / 11

  18. So if we have a library of “tensor operators”, and implementations of ( x 1 , . . . , x d , w ) �→ φ ( x 1 , . . . , x d ; w ) ∀ c , ( x 1 , . . . , x d , w ) �→ J φ | x c ( x 1 , . . . , x d ; w ) ( x 1 , . . . , x d , w ) �→ J φ | w ( x 1 , . . . , x d ; w ) , Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 6 / 11

  19. So if we have a library of “tensor operators”, and implementations of ( x 1 , . . . , x d , w ) �→ φ ( x 1 , . . . , x d ; w ) ∀ c , ( x 1 , . . . , x d , w ) �→ J φ | x c ( x 1 , . . . , x d ; w ) ( x 1 , . . . , x d , w ) �→ J φ | w ( x 1 , . . . , x d ; w ) , we can build an arbitrary directed acyclic graph with these operators at the nodes, compute the response of the resulting mapping, and compute its gradient with back-prop. Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 6 / 11

  20. Writing from scratch a large neural network is complex and error-prone. Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 7 / 11

  21. Writing from scratch a large neural network is complex and error-prone. Multiple frameworks provide libraries of tensor operators and mechanisms to combine them into DAGs and automatically differentiate them. Language(s) License Main backer PyTorch Python BSD Facebook Caffe2 C++, Python Apache Facebook TensorFlow Python, C++ Apache Google MXNet Python, C++, R, Scala Apache Amazon CNTK Python, C++ MIT Microsoft Torch Lua BSD Facebook Theano Python BSD U. of Montreal Caffe C++ BSD 2 clauses U. of CA, Berkeley Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 7 / 11

  22. Writing from scratch a large neural network is complex and error-prone. Multiple frameworks provide libraries of tensor operators and mechanisms to combine them into DAGs and automatically differentiate them. Language(s) License Main backer PyTorch Python BSD Facebook Caffe2 C++, Python Apache Facebook TensorFlow Python, C++ Apache Google MXNet Python, C++, R, Scala Apache Amazon CNTK Python, C++ MIT Microsoft Torch Lua BSD Facebook Theano Python BSD U. of Montreal Caffe C++ BSD 2 clauses U. of CA, Berkeley One approach is to define the nodes and edges of such a DAG statically (Torch, TensorFlow, Caffe, Theano, etc.) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 7 / 11

  23. In TensorFlow, to run a forward/backward pass on w (1) φ (1) φ (3) f ( x ) = x (3) x (1) φ (2) x (0) = x x (2) w (2) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 8 / 11

  24. In TensorFlow, to run a forward/backward pass on w (1) φ (1) φ (3) f ( x ) = x (3) x (1) φ (2) x (0) = x x (2) w (2) φ (1) � x (0) ; w (1) � = w (1) x (0) φ (2) � x (0) , x (1) ; w (2) � = x (0) + w (2) x (1) φ (3) � x (1) , x (2) ; w (1) � = w (1) � x (1) + x (2) � Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 8 / 11

Recommend


More recommend