What are C classes for a NLLLoss loss function in Pytorch?

后端 未结 2 1235
北海茫月
北海茫月 2021-01-05 05:51

I\'m asking about C classes for a NLLLoss loss function.

The documentation states:

The negative log likelihood loss. It is useful to train a c

2条回答
  •  暖寄归人
    2021-01-05 06:46

    I agree with you that the documentation for nn.NLLLoss() is far from ideal, but I think we can clarify your problem here, firstly, by clarifying that "class" is often used as a synonym of "category" in a Machine Learning context.

    Therefore, when PyTorch is talking about C classes, it is actually referring to the number of distinct categories that you are trying to train your network on. So, in the classical example of a categorical neural network trying to classify between "cats" and "dogs", C = 2, since it is either a cat or dog.

    Specifically for this classification problem, it also holds that we only have one single truth value over the array of our categories (a picture cannot depict both a cat AND a dog, but always only either one), which is why we can conveniently indicate the corresponding category of an image by its index (let's say that 0 would indicate a cat, and 1 a dog). Now, we can simply compare the network output to the category we want.

    BUT, in order for this to work, we need to also be clear what these loss values are referencing to (in our network output), since our network will generally make predictions via a softmax over different output neurons, meaning that we have generally more than a single value. Fortunately, PyTorch's nn.NLLLoss does this automatically for you.

    Your above example with the LogSoftmax in fact only produces a single output value, which is a critical case for this example. This way, you basically only have an indication of whether or not something exists/doesn't exist, but it doesn't make much sense to use in a classification example, more so in a regression case (but that would require a totally different loss function to begin with).

    Last, but not least, you should also consider the fact that we generally have 2D tensors as input, since batching (the simultaneous computation of multiple samples) is generally considered a necessary step to match performance. Even if you choose a batch size of 1, this still requires your inputs to be of dimension (batch_size, input_dimensions), and consequently your output tensors of shape (batch_size, number_of_categories).

    This explains why most of the examples you find online are performing the LogSoftmax() over dim=1, since this is the "in-distribution axis", and not the batch axis (which would be dim=0).

    If you simply want to fix your problem, the easiest way would be to extend your random tensor by an additional dimension (torch.randn([1, 5], requires_grad=True)), and then to compare by only one value in your output tensor (print(loss(output, torch.tensor([1])))

提交回复
热议问题