Deep learning 5.7. Writing an autograd function Fran cois Fleuret - PowerPoint PPT Presentation

Deep learning 5.7. Writing an autograd function Fran¸ cois Fleuret https://fleuret.org/ee559/ Nov 1, 2020

We have seen how to write new torch.nn.Module s. We may have to implement new functions usable with autograd, so that Module s remain defined through their forward pass alone. Fran¸ cois Fleuret Deep learning / 5.7. Writing an autograd function 1 / 7

This is achieved by writing sub-classes of torch.autograd.Function , which have to implement two static methods: • forward(...) takes as argument a context to store information needed for the backward pass, and the quantities it should process, which are Tensor s for the differentiable ones, but can also be any other types. It should return one or several Tensor s. Fran¸ cois Fleuret Deep learning / 5.7. Writing an autograd function 2 / 7

This is achieved by writing sub-classes of torch.autograd.Function , which have to implement two static methods: • forward(...) takes as argument a context to store information needed for the backward pass, and the quantities it should process, which are Tensor s for the differentiable ones, but can also be any other types. It should return one or several Tensor s. • backward(...) takes as argument the context and as many Tensor s as forward returns Tensor s, and it should return as many values as forward takes argument, Tensors s for the tensors and None for the others. Fran¸ cois Fleuret Deep learning / 5.7. Writing an autograd function 2 / 7

This is achieved by writing sub-classes of torch.autograd.Function , which have to implement two static methods: • forward(...) takes as argument a context to store information needed for the backward pass, and the quantities it should process, which are Tensor s for the differentiable ones, but can also be any other types. It should return one or several Tensor s. • backward(...) takes as argument the context and as many Tensor s as forward returns Tensor s, and it should return as many values as forward takes argument, Tensors s for the tensors and None for the others. Evaluating such a Function is done through its apply(...) method, which takes as many arguments as forward(...) , context excluded. Fran¸ cois Fleuret Deep learning / 5.7. Writing an autograd function 2 / 7

If you create a new Function named Dummy , when Dummy.apply(...) is called, autograd first adds a new node of type DummyBackward in its graph, and then calls Dummy.forward(...) . Fran¸ cois Fleuret Deep learning / 5.7. Writing an autograd function 3 / 7

If you create a new Function named Dummy , when Dummy.apply(...) is called, autograd first adds a new node of type DummyBackward in its graph, and then calls Dummy.forward(...) . To compute the gradient, autograd evaluates the graph and calls Dummy.backward(...) when it reaches the corresponding node, with the same context as the one given to Dummy.forward(...) . Fran¸ cois Fleuret Deep learning / 5.7. Writing an autograd function 3 / 7

If you create a new Function named Dummy , when Dummy.apply(...) is called, autograd first adds a new node of type DummyBackward in its graph, and then calls Dummy.forward(...) . To compute the gradient, autograd evaluates the graph and calls Dummy.backward(...) when it reaches the corresponding node, with the same context as the one given to Dummy.forward(...) . This machinery is hidden to you and this level of details should not be required for normal operations. Fran¸ cois Fleuret Deep learning / 5.7. Writing an autograd function 3 / 7

Consider a function to set to zero the first n components of a tensor. class KillHead(Function): @staticmethod def forward(ctx, input, n): ctx.n = n result = input.clone() result[:, 0:ctx.n] = 0 return result @staticmethod def backward(ctx, grad_output): result = grad_output.clone() result[:, 0:ctx.n] = 0 return result, None killhead = KillHead.apply Fran¸ cois Fleuret Deep learning / 5.7. Writing an autograd function 4 / 7

It can be used for instance y = torch.empty(3, 8).normal_() x = torch.empty(y.size()).normal_().requires_grad_() criterion = nn.MSELoss() optimizer = torch.optim.SGD([x], lr = 1.0) for k in range(5): r = killhead(x, 2) loss = criterion(r, y) print(k, loss.item()) optimizer.zero_grad() loss.backward() optimizer.step() Fran¸ cois Fleuret Deep learning / 5.7. Writing an autograd function 5 / 7

It can be used for instance y = torch.empty(3, 8).normal_() x = torch.empty(y.size()).normal_().requires_grad_() criterion = nn.MSELoss() optimizer = torch.optim.SGD([x], lr = 1.0) for k in range(5): r = killhead(x, 2) loss = criterion(r, y) print(k, loss.item()) optimizer.zero_grad() loss.backward() optimizer.step() prints 0 1.5175858736038208 1 1.310139536857605 2 1.1358269453048706 3 0.9893561005592346 4 0.8662799000740051 Fran¸ cois Fleuret Deep learning / 5.7. Writing an autograd function 5 / 7

The torch.autograd.gradcheck(...) function checks numerically that the backward function is correct, i.e. � f i ( x 1 , . . . , x j + ǫ, . . . , x D ) − f i ( x 1 , . . . , x j − ǫ, . . . , x D ) � � � ∀ i , j , − ( J f ( x )) i , j � ≤ α � � 2 ǫ � Fran¸ cois Fleuret Deep learning / 5.7. Writing an autograd function 6 / 7

The torch.autograd.gradcheck(...) function checks numerically that the backward function is correct, i.e. � f i ( x 1 , . . . , x j + ǫ, . . . , x D ) − f i ( x 1 , . . . , x j − ǫ, . . . , x D ) � � � ∀ i , j , − ( J f ( x )) i , j � ≤ α � � 2 ǫ � x = torch.empty(10, 20, dtype = torch.float64).uniform_(-1, 1).requires_grad_() input = (x, 4) if gradcheck(killhead, input, eps = 1e-6, atol = 1e-4): print('All good captain.') else: print('Ouch') Fran¸ cois Fleuret Deep learning / 5.7. Writing an autograd function 6 / 7

The torch.autograd.gradcheck(...) function checks numerically that the backward function is correct, i.e. � f i ( x 1 , . . . , x j + ǫ, . . . , x D ) − f i ( x 1 , . . . , x j − ǫ, . . . , x D ) � � � ∀ i , j , − ( J f ( x )) i , j � ≤ α � � 2 ǫ � x = torch.empty(10, 20, dtype = torch.float64).uniform_(-1, 1).requires_grad_() input = (x, 4) if gradcheck(killhead, input, eps = 1e-6, atol = 1e-4): print('All good captain.') else: print('Ouch') � It is advisable to use torch.float64 s for such a check. Fran¸ cois Fleuret Deep learning / 5.7. Writing an autograd function 6 / 7

Consider a function that takes two similar sized Tensor s and apply component-wise ( u , v ) �→ | uv | . Fran¸ cois Fleuret Deep learning / 5.7. Writing an autograd function 7 / 7

Consider a function that takes two similar sized Tensor s and apply component-wise ( u , v ) �→ | uv | . The backward has to compute two tensors, and the forward must keep track of the input to compute the derivatives in the backward. Fran¸ cois Fleuret Deep learning / 5.7. Writing an autograd function 7 / 7

Consider a function that takes two similar sized Tensor s and apply component-wise ( u , v ) �→ | uv | . The backward has to compute two tensors, and the forward must keep track of the input to compute the derivatives in the backward. class Something(Function): @staticmethod def forward(ctx, input1, input2): ctx.save_for_backward(input1, input2) return (input1 * input2).abs() @staticmethod def backward(ctx, grad_output): input1, input2 = ctx.saved_tensors return grad_output * input1.sign() * input2.abs(), \ grad_output * input1.abs() * input2.sign() something = Something.apply Fran¸ cois Fleuret Deep learning / 5.7. Writing an autograd function 7 / 7

The end

Deep learning 5.7. Writing an autograd function Fran cois Fleuret - PowerPoint PPT Presentation

Deep learning 5.7. Writing an autograd function Fran cois Fleuret https://fleuret.org/ee559/ Nov 1, 2020 We have seen how to write new torch.nn.Module s. We may have to implement new functions usable with autograd, so that Module s remain

autograd January 31, 2019 1 Automatic Differentiation 1.1 Import autograd and create a variable

Implementing autograd Slides by Matthew Johnson Autograds implementation

Lesson 5 Emphasis WRITING CAN BE WRITING CAN BE BOLD WRITING CAN BE BOLD COLOR WRITING CAN

11-823 Conlanging Writing Writing Systems Different Writing Systems What makes a writing

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Writing for Funding Part 1: General Writing and Writing for Specific Review Alicia J. Knoedler,

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Writing Your CAREER Proposal Writing Your CAREER Proposal 2016 NSF/CMMI CAREER Proposal Writing

Writing and SPAG Lickey Hills Primary School and Nursery November 2017 Aims: Writing Writing

Leah Soule 27 March 2010 Virtual Writing Center 2.0 A virtual writing center, or online writing

Writing Linguistics 203 Languages of the World Writing and Language Many people associate

Writing Home 8: Formal and Informal Writing We We use formal language for: Writing to

Convolution Layers Convolution Layers In [1]: from mxnet import autograd, nd from mxnet.gluon

Autoregressive Models Autoregressive Models In [1]: from mxnet import autograd, nd, gluon, init

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Mark Lunt Centre for

Using ANTLR In Cybersecurity Stuart Maclean Applied Physics Laboratory University of Washington

New Regularized Algorithms for Transductive Learning Partha Pratim Talukdar University of

CSCI 350 Ch. 4 Threads and Concurrency Mark Redekopp Michael Shindler & Ramesh Govindan

MOBICEAL: TOWARDS SECURE AND PRACTICAL PLAUSIBLY DENIABLE ENCRYPTION ON MOBILE DEVICES Bing

Fatal Tradeoff? Toward A Better Understanding of the Costs of Not Evacuating from a Hurricane in

Course Topics SQL Relational model Normalization Database design PHP MySQL

Submodular Maximization Seffi Naor Lecture 2 4th Cargese Workshop on Combinatorial Optimization

Deep learning 5.7. Writing an autograd function Fran cois Fleuret - PowerPoint PPT Presentation

Deep learning 5.7. Writing an autograd function Fran cois Fleuret https://fleuret.org/ee559/ Nov 1, 2020 We have seen how to write new torch.nn.Module s. We may have to implement new functions usable with autograd, so that Module s remain

autograd January 31, 2019 1 Automatic Differentiation 1.1 Import autograd and create a variable

Implementing autograd Slides by Matthew Johnson Autograds implementation

Lesson 5 Emphasis WRITING CAN BE WRITING CAN BE BOLD WRITING CAN BE BOLD COLOR WRITING CAN

11-823 Conlanging Writing Writing Systems Different Writing Systems What makes a writing

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Writing for Funding Part 1: General Writing and Writing for Specific Review Alicia J. Knoedler,

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Writing Your CAREER Proposal Writing Your CAREER Proposal 2016 NSF/CMMI CAREER Proposal Writing

Writing and SPAG Lickey Hills Primary School and Nursery November 2017 Aims: Writing Writing

Leah Soule 27 March 2010 Virtual Writing Center 2.0 A virtual writing center, or online writing

Writing Linguistics 203 Languages of the World Writing and Language Many people associate

Writing Home 8: Formal and Informal Writing We We use formal language for: Writing to

Convolution Layers Convolution Layers In [1]: from mxnet import autograd, nd from mxnet.gluon

Autoregressive Models Autoregressive Models In [1]: from mxnet import autograd, nd, gluon, init

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling Mark Lunt Centre for

Using ANTLR In Cybersecurity Stuart Maclean Applied Physics Laboratory University of Washington

New Regularized Algorithms for Transductive Learning Partha Pratim Talukdar University of

CSCI 350 Ch. 4 Threads and Concurrency Mark Redekopp Michael Shindler &amp; Ramesh Govindan

MOBICEAL: TOWARDS SECURE AND PRACTICAL PLAUSIBLY DENIABLE ENCRYPTION ON MOBILE DEVICES Bing

Fatal Tradeoff? Toward A Better Understanding of the Costs of Not Evacuating from a Hurricane in

Course Topics SQL Relational model Normalization Database design PHP MySQL

Submodular Maximization Seffi Naor Lecture 2 4th Cargese Workshop on Combinatorial Optimization

CSCI 350 Ch. 4 Threads and Concurrency Mark Redekopp Michael Shindler & Ramesh Govindan