In Tensorflow, what is the difference between sampled_softmax_loss and softmax_cross_entropy_with_logits

回眸只為那壹抹淺笑 提交于 2020-07-16 16:11:11

问题


In tensorflow, there are methods called softmax_cross_entropy_with_logits and sampled_softmax_loss.

I read the tensorflow document and searched google for more information but I couldn't find the difference. It looks like to me both calculates the loss using softmax function.

Using sampled_softmax_loss to calculate the loss

loss = tf.reduce_mean(tf.nn.sampled_softmax_loss(...))

Using softmax_cross_entropy_with_logits to calculate the loss

loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(P, Q))

To me, calculating softmax loss is same as calculating softmaxed cross entropy (e.g. cross_entropy(softmax(train_x)))

Could somebody tell me the why there is two different methods and which method should I use in which case?


回答1:


If your target vocabulary(or in other words amount of classes you want to predict) is really big, it is very hard to use regular softmax, because you have to calculate probability for every word in dictionary. By Using sampled_softmax_loss you only take in account subset V of your vocabulary to calculate your loss.

Sampled softmax only makes sense if we sample(our V) less than vocabulary size. If your vocabulary(amount of labels) is small, there is no point using sampled_softmax_loss.

You can see implementation details in this paper: http://arxiv.org/pdf/1412.2007v2.pdf

Also you can see example where it is used - Sequence to sequence translation in this example




回答2:


Sampled:

Sampled, in both cases means you don't calculate it for all of what's possible as an output (e.g.: if there are too many words in a dictionary to take all of them at each derivation, so we take just a few samples and learn on that for NLP problems).

softmax_cross_entropy_with_logits:

This is the cross entropy and receives logits as inputs and yields what can be used as a loss.

sampled_softmax_loss:

This is a sampled softmax_cross_entropy_with_logits, so it takes just a few samples before using the cross entropy rather than using the full cross entropy: https://github.com/tensorflow/tensorflow/blob/r1.2/tensorflow/python/ops/nn_impl.py#L1269



来源:https://stackoverflow.com/questions/35241251/in-tensorflow-what-is-the-difference-between-sampled-softmax-loss-and-softmax-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!