Weka printing sparse arff file

烈酒焚心 提交于 2019-11-30 23:14:02

This is a very popular problem. The Sparse format by definition does not store 0 values.

Weka ARFF format page clearly says that:

Warning: There is a known problem saving SparseInstance objects from datasets that have string attributes. In Weka, string and nominal data values are stored as numbers; these numbers act as indexes into an array of possible attribute values (this is very efficient). However, the first string value is assigned index 0: this means that, internally, this value is stored as a 0. When a SparseInstance is written, string instances with internal value 0 are not output, so their string value is lost (and when the arff file is read again, the default value 0 is the index of a different string value, so the attribute value appears to change). To get around this problem, add a dummy string value at index 0 that is never used whenever you declare string attributes that are likely to be used in SparseInstance objects and saved as Sparse ARFF files.

You have to put a dummy attribute in the first place. Just modify your code to:

attVals = new FastVector();
attVals.addElement("dummy");
attVals.addElement("A");
attVals.addElement("B");

Let me know if you need any further help.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!