weka stringToWordVector filter stringOptions

試著忘記壹切 提交于 2019-12-21 03:01:09

问题


I'm trying to filter a dataset using weka's java API. I've successfully filtered the attributes I want with a stringToWordVector filter in Weka's GUI but I can't seem to do the same in my java code. I copied and pasted the auto-generated filtering parameters and posted them into my code but am continuing to get errors. Currently, my code looks like this:

Instances newInsts = new Instances(this.instances);
StringToWordVector stringFilter = new StringToWordVector();
stringFilter.setOptions(
            weka.core.Utils.splitOptions("-R 1,2,3,4,8 -W 1000 
                                          -prune-rate -1.0 -N 0 -stemmer
                                           weka.core.stemmers.NullStemmer -M 1
                                          -tokenizer \"weka.core.tokenizers.WordTokenizer 
                                          -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\""));
stringFilter.setInputFormat(newInsts);
newInsts = Filter.useFilter(newInsts, stringFilter);

But I keep getting this error in my eclipse console: No value given for -delimiters option.

(I added extra spacing for readability in the above code. I suspect this has something to do with escaping characters/quotations marks...)

Thanks!


回答1:


You can actually omit most of the options, as they are the defaults for StringToWordVector. The delimiters you're trying to pass are the default delimiters in the default tokenizer, WordTokenizer, which are:

' \r\n\t.,;:'"()?!'


来源:https://stackoverflow.com/questions/4963210/weka-stringtowordvector-filter-stringoptions

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!