Is train/test-Split in unsupervised learning necessary/useful?

问题

In supervised learning I have the typical train/test split to learn the algorithm, e.g. Regression or Classification. Regarding unsupervised learning, my question is: Is train/test split necessary and useful? If yes, why?

回答1:

Well This Depend on the Problem, the form of dataset and Class of Unsupervised algorithm used to solve the particular problem.

Roughly:- Dimensionality reduction techniques are usually tested by calculating the error in reconstruction so there we can use k-fold cross-validation procedure

But on clustering algorithm, I would suggest doing statistical testing in order to test performance. There is also little time-consuming trick which splitting dataset and hand label the test set with meaningfull classes and cross validate

In any case unsupervised algorithm is used on supervised data then it always good cross-validate

overall:- It is not necessary to split data in the train-test set but if we can do it it is always better

Here is article which explains how cross-validation is a good tool for unsupervised learning http://udini.proquest.com/view/cross-validation-for-unsupervised-pqid:1904931481/ and the full text is available here http://arxiv.org/pdf/0909.3052.pdf

https:///www.researchgate.net/post/Which_are_the_methods_to_validate_an_unsupervised_machine_learning_algorithm

来源：https://stackoverflow.com/questions/31673388/is-train-test-split-in-unsupervised-learning-necessary-useful

标签

machine-learning

unsupervised-learning

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!