What is the difference between sparse_categorical_crossentropy and categorical_crossentropy?

问题

What is the difference between sparse_categorical_crossentropy and categorical_crossentropy? When should one loss be used as opposed to the other? For example, are these losses suitable for linear regression?

回答1:

Simply:

categorical_crossentropy (cce) uses a one-hot array to calculate the probability,
sparse_categorical_crossentropy (scce) uses a category index

Consider a classification problem with 5 categories (or classes).

In the case of cce, the one-hot target may be [0, 1, 0, 0, 0] and the model may predict [.2, .5, .1, .1, .1] (probably right)
In the case of scce, the target index may be [1] and the model may predict: [.5].

Consider now a classification problem with 3 classes.

In the case of cce, the one-hot target might be [0, 0, 1] and the model may predict [.5, .1, .4] (probably inaccurate, given that it gives more probability to the first class)
In the case of scce, the target index might be [0], and the model may predict [.5]

Most categorical models produce one-hot and categorical entropy because you save space, but lose A LOT of information (for example, in the 2nd case, index 2 was also very close) when using sparse. I always use one-hot and study the cce output for model reliability.

In short, use sparse_categorical_crossentropy when your classes are mutually exclusive, i.e. you don't care at all about other close enough predictions.

回答2:

From the TensorFlow source code, the sparse_categorical_crossentropy is defined as categorical crossentropy with integer targets:

def sparse_categorical_crossentropy(target, output, from_logits=False, axis=-1):
  """Categorical crossentropy with integer targets.
  Arguments:
      target: An integer tensor.
      output: A tensor resulting from a softmax
          (unless `from_logits` is True, in which
          case `output` is expected to be the logits).
      from_logits: Boolean, whether `output` is the
          result of a softmax, or is a tensor of logits.
      axis: Int specifying the channels axis. `axis=-1` corresponds to data
          format `channels_last', and `axis=1` corresponds to data format
          `channels_first`.
  Returns:
      Output tensor.
  Raises:
      ValueError: if `axis` is neither -1 nor one of the axes of `output`.
  """

From the TensorFlow source code, the categorical_crossentropy is defined as categorical cross-entropy between an output tensor and a target tensor.

def categorical_crossentropy(target, output, from_logits=False, axis=-1):
  """Categorical crossentropy between an output tensor and a target tensor.
  Arguments:
      target: A tensor of the same shape as `output`.
      output: A tensor resulting from a softmax
          (unless `from_logits` is True, in which
          case `output` is expected to be the logits).
      from_logits: Boolean, whether `output` is the
          result of a softmax, or is a tensor of logits.
      axis: Int specifying the channels axis. `axis=-1` corresponds to data
          format `channels_last', and `axis=1` corresponds to data format
          `channels_first`.
  Returns:
      Output tensor.
  Raises:
      ValueError: if `axis` is neither -1 nor one of the axes of `output`.
  """

The meaning of integer targets is that the target labels should be in the form of an integer list that shows the index of class, for example:

For sparse_categorical_crossentropy, For class 1 and class 2 targets, in a 5-class classification problem, the list should be [1,2]. Basically, the targets should be in integer form in order to call sparse_categorical_crossentropy. This is called sparse since the target representation requires much less space than one-hot encoding. For example, a batch with b targets and k classes needs b * k space to be represented in one-hot, whereas a batch with b targets and k classes needs b space to be represented in integer form.
For categorical_crossentropy, for class 1 and class 2 targets, in a 5-class classification problem, the list should be [[0,1,0,0,0], [0,0,1,0,0]]. Basically, the targets should be in one-hot form in order to call categorical_crossentropy.

The representation of the targets are the only difference, the results should be the same since they are both calculating categorical crossentropy.

来源：https://stackoverflow.com/questions/58565394/what-is-the-difference-between-sparse-categorical-crossentropy-and-categorical-c

标签

python

tensorflow

machine-learning

keras

deep-learning