Naive Bayes - no samples for class label 1

纵然是瞬间 提交于 2020-01-25 03:59:05

问题


I am using accord.net. I have successfully implemented the two Decision tree algorithms ID3 and C4.5, now I am trying to implement the Naive Bays algorithm. While there is a lot of sample code on the site, most of it seems to be out of date, or have various issues.

The best sample code I have found on the site so far has been here: http://accord-framework.net/docs/html/T_Accord_MachineLearning_Bayes_NaiveBayes_1.htm

However, when I try and run that code against my data I get:

There are no samples for class label 1. Please make sure that class labels are contiguous and there is at least one training sample for each label.

from line 228 of this file: https://github.com/accord-net/framework/blob/master/Sources/Accord.MachineLearning/Tools.cs when I call learner.learn(inputs, outputs) in my code.

I have already run into the Null bugs that accord has when implementing the other two regression trees, and my data has been sanitized against that issue.

Does any accord.net expert have an idea what would trigger this error?

An excerpt from my code:

    var codebook = new Codification(fulldata, AllAttributeNames);

    /*
     * Get list of all possible combinations
     * Status software blows up if it encounters a value it has not seen before.
     */
    var attributList = new List<IUnivariateFittableDistribution>();
    foreach (var attr in DeciAttributeNames)
    {
        {
            /*
             * By default we'll use a standard static list of values for this column
             */
            var cntLst = codebook[attr].NumberOfSymbols;

            // no decisions can be made off of the variable if it is a constant value
            if (cntLst > 1)
            {
                KeptAttributeNames.Add(attr);
                attributList.Add(new GeneralDiscreteDistribution(cntLst));
            }
        }
    }

    var data = fulldata.Copy(); // this is a datatable

    /*
     * Translate our training data into integer symbols using our codebook
     */
    DataTable symbols = codebook.Apply(data, AllAttributeNames);
    double[][] inputs = symbols.ToJagged<double>(KeptAttributeNames.ToArray());
    int[] outputs = symbols.ToArray<int>(OutAttributeName);
    progBar.PerformStep();

    /*
     * Create a new instance of the learning algorithm
     * and build the algorithm
     */
    var learner = new NaiveBayesLearning<IUnivariateFittableDistribution>()
    {
        // Tell the learner how to initialize the distributions
        Distribution = (classIndex, variableIndex) => attributList[variableIndex]
    };

    var alg = learner.Learn(inputs, outputs);

EDIT: After further experimentation, it seems as though this error only occurs when I am processing a certain number of rows. If I process 60 rows or less than I am fine, if I process 500 rows or more then I am fine. But in between that range I throw this error. Depending on the amount of data I choose, the index number in the error message can change, I have seen it range from 0 to 2.

All the data is coming from the same sql server datasource, the only thing I am adjusting is the Select Top ### portion of the query.


回答1:


You will receive this error in multi-class scenarios when you have defined a label that does not have any sample data. With a small data set your random sampling may by chance exclude all observations with a given label.



来源:https://stackoverflow.com/questions/58403966/naive-bayes-no-samples-for-class-label-1

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!