Train and test set are not compatible error in weka?

谁说胖子不能爱 提交于 2020-01-01 09:02:14

问题


I'm trying to test my model with new dataset. I have done the same preprocessing step as i have done for building my model. I have compared two files but there is no issues. I have all the attributes(train vs test dataset) in same order, same attribute names and data types. But still i'm not able to resolve the issue. Both of the files train and test seems to be similar but the weka explorer is giving me error saying Train and test set are not compatible. How to resolve this error? Is there any way to make test.arff file format as train.arff? Please somebody help me.


回答1:


The same with the comment that I left after problem statement:

All the three attributes are nominal attributes followed by all the possible values quoted by '{}'. One of my guess is that the possible values are not the same. For example, for RESOURCE attribute there is no 199 in test file, while it is in training-file.




回答2:


After struggling with the same problem for a day. I figured out two ways to make the trained model working on supplied test set.

Method 1. Use knowledge flow. For example something like below: CSVLoader(for train set) -> classAssigner -> TrainingSetMaker -->(classifier of your choice) -> ClassfierPerformanceEvaluator - TextViewer. CSVLoader(for test set) -> classAssigner -> TestgSetMaker -->(the same classifier instance above) -> PredictionAppender -> CSVSaver. Then load the data from the CSVLoader or arffLoder for the training set. The model will be trained. After that load data from the loader for the test set. It will evaluate the model(classifier, for example) on the supplied test set and you can see the result from the textviewer (connected to the ClassifierPerformanceEvaluator) and get the saved result from the CSVSaver or arffSaver connected to the PredictionAppender.An additional column, the "classfied as" will be added to the output file. In my case, I used "?" for the class column in the supplied test set if the class labels are not available.

Method 2. Combine the Training and Test set into one file. Then the exact same filter can be applied to both training and test set. Then you can separate training set and test set by applying instance filter. Since I use "?" as class label in the test set. It is not visible in the instance filter indices. Hence just select those indices that you can see in the attribute values to be removed when apply the instance filter. You will get the test data left only. Save it and load it in supply test set at the classifier page.This time it will work. I guess it is the class attribute that causes the NOT compatible train and test set issue. As many classfier requires nominal class attribute. The value of which is converted to the index to available values of the class attribute according to http://weka.wikispaces.com/Why+do+I+get+the+error+message+%27training+and+test+set+are+not+compatible%27%3F




回答3:


See following answer, your train.arff and test.arff should have same header. According to your comparison they are similar but not same.




回答4:


I just encountered the same problem and I found a bare-bones solution. The format of my file is .csv and I simply open my files(for training and testing,respectively) and use the save button on the Preprocess panel of WEKA to save them in .arff format. Then the problem is solved.




回答5:


Look there is a difference between similar and same, your train.arrf and test.arrf should have the same header and if not then you should copy the header of train.arrf and paste it in your test.arrf as a new header.



来源:https://stackoverflow.com/questions/17673288/train-and-test-set-are-not-compatible-error-in-weka

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!