KL Divergence in TensorFlow

懵懂的女人 提交于 2019-11-28 21:33:41

Assuming that your input tensors prob_a and prob_b are probability tensors that sum to 1 along the last axis, you could do it like this:

def kl(x, y):
    X = tf.distributions.Categorical(probs=x)
    Y = tf.distributions.Categorical(probs=y)
    return tf.distributions.kl_divergence(X, Y)

result = kl(prob_a, prob_b)

A simple example:

import numpy as np
import tensorflow as tf
a = np.array([[0.25, 0.1, 0.65], [0.8, 0.15, 0.05]])
b = np.array([[0.7, 0.2, 0.1], [0.15, 0.8, 0.05]])
sess = tf.Session()
print(kl(a, b).eval(session=sess))  # [0.88995184 1.08808468]

You would get the same result with

np.sum(a * np.log(a / b), axis=1) 

However, this implementation is a bit buggy (checked in Tensorflow 1.8.0).

If you have zero probabilities in a, e.g. if you try [0.8, 0.2, 0.0] instead of [0.8, 0.15, 0.05], you will get nan even though by Kullback-Leibler definition 0 * log(0 / b) should contribute as zero.

To mitigate this, one should add some small numerical constant. It is also prudent to use tf.distributions.kl_divergence(X, Y, allow_nan_stats=False) to cause a runtime error in such situations.

Also, if there are some zeros in b, you will get inf values which won't be caught by the allow_nan_stats=False option so those have to be handled as well.

For there is softmax_cross_entropy_with_logits, there is no need to optimize on KL.

KL(prob_a, prob_b)  
  = Sum(prob_a * log(prob_a/prob_b))  
  = Sum(prob_a * log(prob_a) - prob_a * log(prob_b))  
  = - Sum(prob_a * log(prob_b)) + Sum(prob_a * log(prob_a)) 
  = - Sum(prob_a * log(prob_b)) + const 
  = H(prob_a, prob_b) + const 
E.J. White

I'm not sure why it's not implemented, but perhaps there is a workaround. The KL divergence is defined as:

KL(prob_a, prob_b) = Sum(prob_a * log(prob_a/prob_b))

The cross entropy H, on the other hand, is defined as:

H(prob_a, prob_b) = -Sum(prob_a * log(prob_b))

So, if you create a variable y = prob_a/prob_b, you could obtain the KL divergence by calling negative H(proba_a, y). In Tensorflow notation, something like:

KL = tf.reduce_mean(-tf.nn.softmax_cross_entropy_with_logits(prob_a, y))

tf.contrib.distributions.kl takes instances of a tf.distribution not a Tensor.

Example:

  ds = tf.contrib.distributions
  p = ds.Normal(loc=0., scale=1.)
  q = ds.Normal(loc=1., scale=2.)
  kl = ds.kl_divergence(p, q)
  # ==> 0.44314718

Assuming that you have access to logits a and b:

prob_a = tf.nn.softmax(a)
cr_aa = tf.nn.softmax_cross_entropy_with_logits(prob_a, a)
cr_ab = tf.nn.softmax_cross_entropy_with_logits(prob_a, b)
kl_ab = tf.reduce_sum(cr_ab - cr_aa)

I think this might work:

tf.reduce_sum(p * tf.log(p/q))

where p is my actual probability distribution and q is my approximate probability distribution.

I used the function from this code (from this Medium post) to calculate the KL-divergence of any given tensor from a normal Gaussian distribution, where sd is the standard deviation and mn is the tensor.

latent_loss = -0.5 * tf.reduce_sum(1.0 + 2.0 * sd - tf.square(mn) - tf.exp(2.0 * sd), 1)
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!