AMMI – Introduction to Deep Learning 4.1. DAG networks Fran¸ cois Fleuret https://fleuret.org/ammi-2018/ Wed Aug 29 16:57:27 CAT 2018 ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
Everything we have seen for an MLP w (1) b (1) w (2) b (2) x + σ + σ f ( x ) × × Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 1 / 11
Everything we have seen for an MLP w (1) b (1) w (2) b (2) x + σ + σ f ( x ) × × can be generalized to an arbitrary “Directed Acyclic Graph” (DAG) of operators w (1) φ (1) φ (3) f ( x ) φ (2) x w (2) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 1 / 11
Remember that we use tensorial notation. If ( a 1 , . . . , a Q ) = φ ( b 1 , . . . , b R ), we have ∂ a 1 ∂ a 1 . . . � ∂ a ∂ b 1 ∂ b R � . . ... . . = J φ = . . . ∂ b ∂ a Q ∂ a Q . . . ∂ b 1 ∂ b R This notation does not specify at which point this is computed. It will always be for the forward-pass activations. Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 2 / 11
Remember that we use tensorial notation. If ( a 1 , . . . , a Q ) = φ ( b 1 , . . . , b R ), we have ∂ a 1 ∂ a 1 . . . � ∂ a ∂ b 1 ∂ b R � . . ... . . = J φ = . . . ∂ b ∂ a Q ∂ a Q . . . ∂ b 1 ∂ b R This notation does not specify at which point this is computed. It will always be for the forward-pass activations. Also, if ( a 1 , . . . , a Q ) = φ ( b 1 , . . . , b R , c 1 , . . . , c S ), we use ∂ a 1 ∂ a 1 . . . � ∂ a ∂ c 1 ∂ c S � . . ... . . = J φ | c = . . . ∂ c ∂ a Q ∂ a Q . . . ∂ c 1 ∂ c S Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 2 / 11
Forward pass w (1) φ (1) φ (3) f ( x ) = x (3) x (1) x (0) = x φ (2) x (2) w (2) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 3 / 11
Forward pass w (1) φ (1) φ (3) f ( x ) = x (3) x (1) x (0) = x φ (2) x (2) w (2) x (0) = x Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 3 / 11
Forward pass w (1) φ (1) φ (3) f ( x ) = x (3) x (1) x (0) = x φ (2) x (2) w (2) x (0) = x x (1) = φ (1) ( x (0) ; w (1) ) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 3 / 11
Forward pass w (1) φ (1) φ (3) f ( x ) = x (3) x (1) x (0) = x φ (2) x (2) w (2) x (0) = x x (1) = φ (1) ( x (0) ; w (1) ) x (2) = φ (2) ( x (0) , x (1) ; w (2) ) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 3 / 11
Forward pass w (1) φ (1) φ (3) f ( x ) = x (3) x (1) x (0) = x φ (2) x (2) w (2) x (0) = x x (1) = φ (1) ( x (0) ; w (1) ) x (2) = φ (2) ( x (0) , x (1) ; w (2) ) f ( x ) = x (3) = φ (3) ( x (1) , x (2) ; w (1) ) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 3 / 11
Backward pass, derivatives w.r.t activations w (1) φ (1) φ (3) f ( x ) = x (3) x (1) φ (2) x (0) = x x (2) w (2) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 4 / 11
Backward pass, derivatives w.r.t activations w (1) φ (1) φ (3) f ( x ) = x (3) x (1) φ (2) x (0) = x x (2) w (2) � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ x (3) � � � = = J φ (3) | x (2) ∂ x (2) ∂ x (2) ∂ x (3) ∂ x (3) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 4 / 11
Backward pass, derivatives w.r.t activations w (1) φ (1) φ (3) f ( x ) = x (3) x (1) φ (2) x (0) = x x (2) w (2) � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ x (3) � � � = = J φ (3) | x (2) ∂ x (2) ∂ x (2) ∂ x (3) ∂ x (3) � ∂ 퓁 � � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ 퓁 � � ∂ x (2) ∂ x (3) � � � � � = + = J φ (2) | x (1) + J φ (3) | x (1) ∂ x (1) ∂ x (1) ∂ x (2) ∂ x (1) ∂ x (3) ∂ x (2) ∂ x (3) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 4 / 11
Backward pass, derivatives w.r.t activations w (1) φ (1) φ (3) f ( x ) = x (3) x (1) φ (2) x (0) = x x (2) w (2) � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ x (3) � � � = = J φ (3) | x (2) ∂ x (2) ∂ x (2) ∂ x (3) ∂ x (3) � ∂ 퓁 � � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ 퓁 � � ∂ x (2) ∂ x (3) � � � � � = + = J φ (2) | x (1) + J φ (3) | x (1) ∂ x (1) ∂ x (1) ∂ x (2) ∂ x (1) ∂ x (3) ∂ x (2) ∂ x (3) � ∂ 퓁 � � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ 퓁 � � ∂ x (1) ∂ x (2) � � � � � = + = J φ (1) | x (0) + J φ (2) | x (0) ∂ x (0) ∂ x (0) ∂ x (1) ∂ x (0) ∂ x (2) ∂ x (1) ∂ x (2) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 4 / 11
Backward pass, derivatives w.r.t parameters w (1) φ (1) x (1) φ (3) f ( x ) = x (3) φ (2) x (0) = x x (2) w (2) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 5 / 11
Backward pass, derivatives w.r.t parameters w (1) φ (1) x (1) φ (3) f ( x ) = x (3) φ (2) x (0) = x x (2) w (2) � ∂ 퓁 � � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ 퓁 � � ∂ x (1) ∂ x (3) � � � � � = + = J φ (1) | w (1) + J φ (3) | w (1) ∂ w (1) ∂ w (1) ∂ x (1) ∂ w (1) ∂ x (3) ∂ x (1) ∂ x (3) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 5 / 11
Backward pass, derivatives w.r.t parameters w (1) φ (1) x (1) φ (3) f ( x ) = x (3) φ (2) x (0) = x x (2) w (2) � ∂ 퓁 � � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ 퓁 � � ∂ x (1) ∂ x (3) � � � � � = + = J φ (1) | w (1) + J φ (3) | w (1) ∂ w (1) ∂ w (1) ∂ x (1) ∂ w (1) ∂ x (3) ∂ x (1) ∂ x (3) � ∂ 퓁 � � ∂ 퓁 � ∂ 퓁 � ∂ x (2) � � � = = J φ (2) | w (2) ∂ w (2) ∂ w (2) ∂ x (2) ∂ x (2) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 5 / 11
So if we have a library of “tensor operators”, and implementations of ( x 1 , . . . , x d , w ) �→ φ ( x 1 , . . . , x d ; w ) ∀ c , ( x 1 , . . . , x d , w ) �→ J φ | x c ( x 1 , . . . , x d ; w ) ( x 1 , . . . , x d , w ) �→ J φ | w ( x 1 , . . . , x d ; w ) , Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 6 / 11
So if we have a library of “tensor operators”, and implementations of ( x 1 , . . . , x d , w ) �→ φ ( x 1 , . . . , x d ; w ) ∀ c , ( x 1 , . . . , x d , w ) �→ J φ | x c ( x 1 , . . . , x d ; w ) ( x 1 , . . . , x d , w ) �→ J φ | w ( x 1 , . . . , x d ; w ) , we can build an arbitrary directed acyclic graph with these operators at the nodes, compute the response of the resulting mapping, and compute its gradient with back-prop. Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 6 / 11
Writing from scratch a large neural network is complex and error-prone. Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 7 / 11
Writing from scratch a large neural network is complex and error-prone. Multiple frameworks provide libraries of tensor operators and mechanisms to combine them into DAGs and automatically differentiate them. Language(s) License Main backer PyTorch Python BSD Facebook Caffe2 C++, Python Apache Facebook TensorFlow Python, C++ Apache Google MXNet Python, C++, R, Scala Apache Amazon CNTK Python, C++ MIT Microsoft Torch Lua BSD Facebook Theano Python BSD U. of Montreal Caffe C++ BSD 2 clauses U. of CA, Berkeley Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 7 / 11
Writing from scratch a large neural network is complex and error-prone. Multiple frameworks provide libraries of tensor operators and mechanisms to combine them into DAGs and automatically differentiate them. Language(s) License Main backer PyTorch Python BSD Facebook Caffe2 C++, Python Apache Facebook TensorFlow Python, C++ Apache Google MXNet Python, C++, R, Scala Apache Amazon CNTK Python, C++ MIT Microsoft Torch Lua BSD Facebook Theano Python BSD U. of Montreal Caffe C++ BSD 2 clauses U. of CA, Berkeley One approach is to define the nodes and edges of such a DAG statically (Torch, TensorFlow, Caffe, Theano, etc.) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 7 / 11
In TensorFlow, to run a forward/backward pass on w (1) φ (1) φ (3) f ( x ) = x (3) x (1) φ (2) x (0) = x x (2) w (2) Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 8 / 11
In TensorFlow, to run a forward/backward pass on w (1) φ (1) φ (3) f ( x ) = x (3) x (1) φ (2) x (0) = x x (2) w (2) φ (1) � x (0) ; w (1) � = w (1) x (0) φ (2) � x (0) , x (1) ; w (2) � = x (0) + w (2) x (1) φ (3) � x (1) , x (2) ; w (1) � = w (1) � x (1) + x (2) � Fran¸ cois Fleuret AMMI – Introduction to Deep Learning / 4.1. DAG networks 8 / 11
Recommend
More recommend