Simple text classification using naive bayes (weka) in java

馋奶兔 提交于 2019-12-10 04:35:09

问题


I try to do text classification naive bayes weka libarary in my java code, but i think the result of the classification is not correct, i don't know what's the problem. I use arff file for the input.

this is my training data:

@relation hamspam

@attribute text string
@attribute class {spam,ham}

@data
'good',ham
'good',ham
'very good',ham
'bad',spam
'very bad',spam
'very bad, very bad',spam
'good good bad',ham

this is my testing_data:

@relation test

@attribute text string
@attribute class {spam,ham}

@data
'good bad very bad',?
'good bad very bad',?
'good',?
'good very good',?
'bad',?
'very good',?
'very very good',?

and this is my code:

public static void NaiveBayes(String training_file, String testing_file) throws FileNotFoundException, IOException, Exception{
         //filter
        StringToWordVector filter = new StringToWordVector();

        Classifier naive = new NaiveBayes();

        //training data
        Instances train = new Instances(new BufferedReader(new FileReader(training_file)));
        int lastIndex = train.numAttributes() - 1;
        train.setClassIndex(lastIndex);
        filter.setInputFormat(train);
        train = Filter.useFilter(train, filter);

        //testing data
        Instances test = new Instances(new BufferedReader(new FileReader(testing_file)));
        test.setClassIndex(lastIndex);
        filter.setInputFormat(test);
        Instances test2 = Filter.useFilter(test, filter);

        naive.buildClassifier(train);

        for(int i=0; i<test2.numInstances(); i++) {
            System.out.println(test.instance(i));
            double index = naive.classifyInstance(test2.instance(i));
            String className = train.attribute(0).value((int)index);
            System.out.println(className);
        }
    }

The result indicate that the data that should have been classified into class spam classified into class ham, and the data that should have been classified into class ham classified into class spam. what's the problem?, help me please..


回答1:


Your code seems fine, though i have two comments to make.

  • First, you set filter's format with this command filter.setInputFormat(train); so as to use this filter and make test and train data compatible. You should not change the format again with this command: filter.setInputFormat(test); as this might create compatibility issues.
  • Also instead of getting the first attribute: train.attribute(0).value((int)index); (which seems to me that is not corresponds to class attribute) try using this command train.classAttribute().value((int)index);

P.S. Check Load naïve Bayes model in Java code using weka jar for a complete workflow and explanation of a classification example (the material was once in SO Documentation). This example is using the LibLinear classifier but the logic is the same.



来源:https://stackoverflow.com/questions/41935193/simple-text-classification-using-naive-bayes-weka-in-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!