What is the difference between backpropagation and reverse-mode autodiff?

余生长醉 提交于 2020-02-21 10:24:27

问题


Going through this book, I am familiar with the following:

For each training instance the backpropagation algorithm first makes a prediction (forward pass), measures the error, then goes through each layer in reverse to measure the error contribution from each connection (reverse pass), and finally slightly tweaks the connection weights to reduce the error.

However I am not sure how this differs from the reverse-mode autodiff implementation by TensorFlow.

As far as I know reverse-mode autodiff first goes through the graph in the forward direction and then in the second pass computes all partial derivatives for the outputs with respect to the inputs. This is very similar to the propagation algorithm.

How does backpropagation differ from reverse-mode autodiff ?


回答1:


Thanks to the answer by David Parks for the valid contribution and useful links, however I have found the answer to this question by the author of the book himself, which may provide a more concise answer:

Bakpropagation refers to the whole process of training an artificial neural network using multiple backpropagation steps, each of which computes gradients and uses them to perform a Gradient Descent step. In contrast, reverse-mode auto diff is simply a technique used to compute gradients efficiently and it happens to be used by backpropagation.




回答2:


Automatic differentiation differs from the method taught in standard calculus classes on how gradients are computed, and in some features such as its native ability to take the gradient of a data structure and not just a well defined mathematical function. I'm not expert enough to go into further detail, but this is a great reference that explains it in much more depth:

https://alexey.radul.name/ideas/2013/introduction-to-automatic-differentiation/

Here's another guide that looks quite nice that I just found now.

https://rufflewind.com/2016-12-30/reverse-mode-automatic-differentiation

I believe backprop may formally refer to the by-hand calculus algorithm for computing gradients, at least that's how it was originally derived and is how it's taught in classes on the subject. But in practice, backprop is used quite interchangeably with the automatic differentiation approach described in the above guides. So splitting those two terms is probably as much an effort in linguistics as it is mathematics.

I also noted this nice article on the backpropagation algorithm to compare against the above guides on automatic differentiation.

https://brilliant.org/wiki/backpropagation/




回答3:


The most important distinction between backpropagation and reverse-mode AD is that reverse-mode AD computes the vector-Jacobian product of a vector valued function from R^n -> R^m, while backpropagation computes the gradient of a scalar valued function from R^n -> R. Backpropagation is therefore a subset of reverse-mode AD.

When we train neural networks, we always have a scalar-valued loss function, so we are always using backpropagation. Since backprop is a subset of reverse-mode AD, then we are also using reverse-mode AD when we train a neural network.

Whether or not backpropagation takes the more general definition of reverse-mode AD as applied to a scalar loss function, or the more specific definition of reverse-mode AD as applied to a scalar loss function for training neural networks is a matter of personal taste. It's a word that has slightly different meaning in different contexts, but is most commonly used in the machine learning community to talk about computing gradients of neural network parameters using a scalar loss function.

For completeness: Sometimes reverse-mode AD can compute the full Jacobian on a single reverse pass, not just the vector-Jacobian product. Also, the vector Jacobian product for a scalar function where the vector is the vector [1.0] is the same as the gradient.



来源:https://stackoverflow.com/questions/49926192/what-is-the-difference-between-backpropagation-and-reverse-mode-autodiff

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!