问题
What is the difference between sparse_categorical_crossentropy and categorical_crossentropy? When should one loss be used as opposed to the other? For example, are these losses suitable for linear regression?
回答1:
Simply:
categorical_crossentropy(cce) uses a one-hot array to calculate the probability,sparse_categorical_crossentropy(scce) uses a category index
Consider a classification problem with 5 categories (or classes).
In the case of
cce, the one-hot target may be[0, 1, 0, 0, 0]and the model may predict[.2, .5, .1, .1, .1](probably right)In the case of
scce, the target index may be [1] and the model may predict: [.5].
Consider now a classification problem with 3 classes.
- In the case of
cce, the one-hot target might be[0, 0, 1]and the model may predict[.5, .1, .4](probably inaccurate, given that it gives more probability to the first class) - In the case of
scce, the target index might be[0], and the model may predict[.5]
Most categorical models produce one-hot and categorical entropy because you save space, but lose A LOT of information (for example, in the 2nd case, index 2 was also very close) when using sparse. I always use one-hot and study the cce output for model reliability.
In short, use sparse_categorical_crossentropy when your classes are mutually exclusive, i.e. you don't care at all about other close enough predictions.
回答2:
From the TensorFlow source code, the sparse_categorical_crossentropy is defined as categorical crossentropy with integer targets:
def sparse_categorical_crossentropy(target, output, from_logits=False, axis=-1):
"""Categorical crossentropy with integer targets.
Arguments:
target: An integer tensor.
output: A tensor resulting from a softmax
(unless `from_logits` is True, in which
case `output` is expected to be the logits).
from_logits: Boolean, whether `output` is the
result of a softmax, or is a tensor of logits.
axis: Int specifying the channels axis. `axis=-1` corresponds to data
format `channels_last', and `axis=1` corresponds to data format
`channels_first`.
Returns:
Output tensor.
Raises:
ValueError: if `axis` is neither -1 nor one of the axes of `output`.
"""
From the TensorFlow source code, the categorical_crossentropy is defined as categorical cross-entropy between an output tensor and a target tensor.
def categorical_crossentropy(target, output, from_logits=False, axis=-1):
"""Categorical crossentropy between an output tensor and a target tensor.
Arguments:
target: A tensor of the same shape as `output`.
output: A tensor resulting from a softmax
(unless `from_logits` is True, in which
case `output` is expected to be the logits).
from_logits: Boolean, whether `output` is the
result of a softmax, or is a tensor of logits.
axis: Int specifying the channels axis. `axis=-1` corresponds to data
format `channels_last', and `axis=1` corresponds to data format
`channels_first`.
Returns:
Output tensor.
Raises:
ValueError: if `axis` is neither -1 nor one of the axes of `output`.
"""
The meaning of integer targets is that the target labels should be in the form of an integer list that shows the index of class, for example:
For
sparse_categorical_crossentropy, For class 1 and class 2 targets, in a 5-class classification problem, the list should be [1,2]. Basically, the targets should be in integer form in order to callsparse_categorical_crossentropy. This is called sparse since the target representation requires much less space than one-hot encoding. For example, a batch withbtargets andkclasses needsb * kspace to be represented in one-hot, whereas a batch withbtargets andkclasses needsbspace to be represented in integer form.For
categorical_crossentropy, for class 1 and class 2 targets, in a 5-class classification problem, the list should be[[0,1,0,0,0], [0,0,1,0,0]]. Basically, the targets should be in one-hot form in order to callcategorical_crossentropy.
The representation of the targets are the only difference, the results should be the same since they are both calculating categorical crossentropy.
来源:https://stackoverflow.com/questions/58565394/what-is-the-difference-between-sparse-categorical-crossentropy-and-categorical-c