automatic-differentiation

Repeated use of GradientTape for multiple Jacobian calculations

阅读更多关于 Repeated use of GradientTape for multiple Jacobian calculations

问题 I am attempting to compute the Jacobian of a TensorFlow neural network's outputs with respect to its inputs. This is easily achieved with the tf.GradientTape.jacobian method. The trivial example provided in the TensorFlow documentation is as follows: with tf.GradientTape() as g: x = tf.constant([1.0, 2.0]) g.watch(x) y = x * x jacobian = g.jacobian(y, x) This is fine if I want only want to compute the Jacobian of a single instance of the input tensor x . However, I need to repeatedly evaluate

Is there any working implementation of reverse mode automatic differentiation for Haskell?

阅读更多关于 Is there any working implementation of reverse mode automatic differentiation for Haskell?

问题 The closest-related implementation in Haskell I have seen is the forward mode at http://hackage.haskell.org/packages/archive/fad/1.0/doc/html/Numeric-FAD.html. The closest related related research appears to be reverse mode for another functional language related to Scheme at http://www.bcl.hamilton.ie/~qobi/stalingrad/. I see reverse mode in Haskell as kind of a holy grail for a lot of tasks, with the hopes that it could use Haskell's nested data parallelism to gain a nice speedup in heavy

Automatic differentiation (AD) with respect to list of matrices in Haskell

阅读更多关于 Automatic differentiation (AD) with respect to list of matrices in Haskell

问题 I am trying to understand how can I use Numeric.AD (automatic differentiation) in Haskell. I defined a simple matrix type and a scalar function taking an array and two matrices as arguments. I want to use AD to get the gradient of the scoring function with respect to both matrices, but I'm running into compilation problems. Here is the code {-# LANGUAGE DeriveTraversable, DeriveFunctor, DeriveFoldable #-} import Numeric.AD.Mode.Reverse as R import Data.Traversable as T import Data.Foldable as

Partial Derivative using Autograd

阅读更多关于 Partial Derivative using Autograd

问题 I have a function that takes in a multivariate argument x. Here x = [x1,x2,x3]. Let's say my function looks like: f(x,T) = np.dot(x,T) + np.exp(np.dot(x,T) where T is a constant. I am interested in finding df/dx1, df/dx2 and df/dx3 functions. I have achieved some success using scipy diff, but I am a bit skeptical because it uses numerical differences. Yesterday, my colleague pointed me to Autograd (github). Since it seems to be a popular package, I am hoping someone here knows how to get

How does tensorflow handle non differentiable nodes during gradient calculation?

阅读更多关于 How does tensorflow handle non differentiable nodes during gradient calculation?

问题 I understood the concept of automatic differentiation, but couldn't find any explanation how tensorflow calculates the error gradient for non differentiable functions as for example tf.where in my loss function or tf.cond in my graph. It works just fine, but I would like to understand how tensorflow backpropagates the error through such nodes, since there is no formula to calculate the gradient from them. 回答1: In the case of tf.where , you have a function with three inputs, condition C ,

How to get more performance out of automatic differentiation?

阅读更多关于 How to get more performance out of automatic differentiation?

问题 I am having a hard time optimizing a program that is relying on ad s conjugateGradientDescent function for most of it's work. Basically my code is a translation of an old papers code that is written in Matlab and C. I have not measured it, but that code is running at several iterations per second. Mine is in the order of minutes per iteration ... The code is available in this repositories: https://github.com/fhaust/aer https://github.com/fhaust/aer-utils The code in question can be run by

Numeric.AD and typing problem

阅读更多关于 Numeric.AD and typing problem

问题 I'm trying to work with Numeric.AD and a custom Expr type. I wish to calculate the symbolic gradient of user inputted expression. The first trial with a constant expression works nicely: calcGrad0 :: [Expr Double] calcGrad0 = grad df vars where df [x,y] = eval (env [x,y]) (EVar "x"*EVar "y") env vs = zip varNames vs varNames = ["x","y"] vars = map EVar varNames This works: >calcGrad0 [Const 0.0 :+ (Const 0.0 :+ (EVar "y" :* Const 1.0)),Const 0.0 :+ (Const 0.0 :+ (EVar "x" :* Const 1.0))]

How to do automatic differentiation on hmatrix?

阅读更多关于 How to do automatic differentiation on hmatrix?

问题 Sooooo ... as it turns out going from fake matrices to hmatrix datatypes turns out to be nontrivial :) Preamble for reference: {-# LANGUAGE RankNTypes #-} {-# LANGUAGE ParallelListComp #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE FlexibleContexts #-} import Numeric.LinearAlgebra.HMatrix import Numeric.AD reconstruct :: (Container Vector a, Num (Vector a)) => [a] -> [Matrix a] -> Matrix a reconstruct as φs = sum [ a `scale` φ | a <- as | φ <- φs ]

How does tensorflow handle non differentiable nodes during gradient calculation?

阅读更多关于 How does tensorflow handle non differentiable nodes during gradient calculation?

I understood the concept of automatic differentiation, but couldn't find any explanation how tensorflow calculates the error gradient for non differentiable functions as for example tf.where in my loss function or tf.cond in my graph. It works just fine, but I would like to understand how tensorflow backpropagates the error through such nodes, since there is no formula to calculate the gradient from them. In the case of tf.where , you have a function with three inputs, condition C , value on true T and value on false F , and one output Out . The gradient receives one value and has to return

How to get more performance out of automatic differentiation?

阅读更多关于 How to get more performance out of automatic differentiation?

I am having a hard time optimizing a program that is relying on ad s conjugateGradientDescent function for most of it's work. Basically my code is a translation of an old papers code that is written in Matlab and C. I have not measured it, but that code is running at several iterations per second. Mine is in the order of minutes per iteration ... The code is available in this repositories: https://github.com/fhaust/aer https://github.com/fhaust/aer-utils The code in question can be run by following these commands: $ cd aer-utils $ cabal sandbox init $ cabal sandbox add-source ../aer $ cabal