What is `target` in `ClassificationDataSet` good for?

蓝咒 提交于 2019-12-07 20:12:52

问题


I've tried to find out what the parameter target of ClassificationDataSet can be used for, but I'm still not clear about that.

What I've tried

>>> from pybrain.datasets import ClassificationDataSet
>>> help(ClassificationDataSet)
Help on class ClassificationDataSet in module pybrain.datasets.classification:

class ClassificationDataSet(pybrain.datasets.supervised.SupervisedDataSet)
 |  Specialized data set for classification data. Classes are to be numbered from 0 to nb_classes-1.
 |  
 |  Method resolution order:
 |      ClassificationDataSet
 |      pybrain.datasets.supervised.SupervisedDataSet
 |      pybrain.datasets.dataset.DataSet
 |      pybrain.utilities.Serializable
 |      __builtin__.object
 |  
 |  Methods defined here:
 |  
 |  __add__(self, other)
 |      Adds the patterns of two datasets, if dimensions and type match.
 |  
 |  __init__(self, inp, target=1, nb_classes=0, class_labels=None)
 |      Initialize an empty dataset. 
 |      
 |      `inp` is used to specify the dimensionality of the input. While the 
 |      number of targets is given by implicitly by the training samples, it can
 |      also be set explicity by `nb_classes`. To give the classes names, supply
 |      an iterable of strings as `class_labels`.
 |  
 |  __reduce__(self)

As this does not contain information about target (except that it's 1 per default) I took a look at the source code of ClassificationDataSet:

class ClassificationDataSet(SupervisedDataSet):
    """ Specialized data set for classification data. Classes are to be numbered from 0 to nb_classes-1. """

    def __init__(self, inp, target=1, nb_classes=0, class_labels=None):
        """Initialize an empty dataset.

        `inp` is used to specify the dimensionality of the input. While the
        number of targets is given by implicitly by the training samples, it can
        also be set explicity by `nb_classes`. To give the classes names, supply
        an iterable of strings as `class_labels`."""
        # FIXME: hard to keep nClasses synchronized if appendLinked() etc. is used.
        SupervisedDataSet.__init__(self, inp, target)
        self.addField('class', 1)
        self.nClasses = nb_classes
        if len(self) > 0:
            # calculate class histogram, if we already have data
            self.calculateStatistics()
        self.convertField('target', int)
        if class_labels is None:
            self.class_labels = list(set(self.getField('target').flatten()))
        else:
            self.class_labels = class_labels
        # copy classes (may be changed into other representation)
        self.setField('class', self.getField('target'))

It's still not clear, so I've looked at SupervisedDataSet:

class SupervisedDataSet(DataSet):
    """SupervisedDataSets have two fields, one for input and one for the target.
    """

    def __init__(self, inp, target):
        """Initialize an empty supervised dataset.

        Pass `inp` and `target` to specify the dimensions of the input and
        target vectors."""
        DataSet.__init__(self)
        if isscalar(inp):
            # add input and target fields and link them
            self.addField('input', inp)
            self.addField('target', target)
        else:
            self.setField('input', inp)
            self.setField('target', target)

        self.linkFields(['input', 'target'])

        # reset the index marker
        self.index = 0

        # the input and target dimensions
        self.indim = self.getDimension('input')
        self.outdim = self.getDimension('target')

It seems to be about the output dimension. But shouldn't target then be nb_classes?


回答1:


target argument is dimension of the training sample's output dimension. To fully understand the difference between it and nb_classes lets look at the _convertToOneOfMany method:

def _convertToOneOfMany(self, bounds=(0, 1)):
    """Converts the target classes to a 1-of-k representation, retaining the
    old targets as a field `class`.

    To supply specific bounds, set the `bounds` parameter, which consists of
    target values for non-membership and membership."""
    if self.outdim != 1:
        # we already have the correct representation (hopefully...)
        return
    if self.nClasses <= 0:
        self.calculateStatistics()
    oldtarg = self.getField('target')
    newtarg = zeros([len(self), self.nClasses], dtype='Int32') + bounds[0]
    for i in range(len(self)):
        newtarg[i, int(oldtarg[i])] = bounds[1]
    self.setField('target', newtarg)
    self.setField('class', oldtarg)

So theoretically speaking target is dimension of the output while nb_classes is number of classification classes. This is useful for data transformation. For example lets say we have data for training network in xor function like so:

 IN   OUT
[0,0],0
[0,1],1
[1,0],1
[1,1],0

So the dimension of output is equal to one, but there are two output classes: 0 and 1. So we can change our data to:

 IN    OUT
[0,0],(0,1)
[0,1],(1,0)
[1,0],(1,0)
[1,1],(0,1)

Now first parameter of output is the value of True and second is the value of False. This is common practice with more classes for example in handwriting recognition.

Hope that clear this lite bit up for you.



来源:https://stackoverflow.com/questions/24231157/what-is-target-in-classificationdataset-good-for

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!