How to apply InformationGain in rapidminer with seperate test set ?

梦想的初衷 提交于 2019-12-12 01:43:37

问题


I am dealing with text classification in rapidminer. I have seperate test and training splits. I applied Information Gain to a dataset using n-fold cross validation but i am confused on how to apply it on seperate test set ? Below is attached image

In figure i have connected the word list output from first "Process Documents From Files" which is used for training to second "Processed Documents From Files" which is used for testing but i want to apply the reduced feature to the second "Process Documents From Files" which perhaps should be the one returned from "Select By Weight" (reduced dimensions) operator but it returns weights which i cannot provide to second "Process Documents From Files". I searched alot but did'nt managed to find anything which can satisfy my need ?

Is it really possible for Rapidminer to have seperate test/train splits and apply feature selection ?

Is there any way to convert these weights into word list ? Please don't say write in repository (i can't do this) ?

In such scenario when i have different test/train splits and needs to apply feature selection, how would i make sure that test/train splits have same dimension vectors ?

I am really trapped out at it, kindly help ...


回答1:


Immediately after the lower Process Documents operator insert a new Select By Weight operator before the Apply Model. Use a Multiply operator to copy the weights from the Weight By Information Gain operator and connect this to the input of the new Select By Weight operator.



来源:https://stackoverflow.com/questions/21853989/how-to-apply-informationgain-in-rapidminer-with-seperate-test-set

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!