How to create source data file for training and testing file in libsvm?

自作多情 提交于 2019-12-12 03:35:03

问题


I want to use a dataset to train a model. The dataset has three different types of physiological data. type 1, type 2 and type 3. The format of the libSVM is as below: label index1:value1 index2:value2....

Now, I have taken label as 1 for type 1, 2 for type 2 and 3 for type 3. Where as all the values are save as 1:(value). So, my training and testing file is as below.

1 1:value \n 1 1:value \n 1 1:value \n 1 1:value \n 1 1:value \n . . . 1 1:value \n 2 1:value \n 2 1:value \n 2 1:value \n 2 1:value \n 2 1:value \n . . . 2 1:value \n 3 1:value \n 3 1:value \n 3 1:value \n 3 1:value \n . . . 3 1:value \n

So, I am traing the svm with this kind of source file and testing with similar kind of source file. I want to make sure if I am using the SVM data format correctly. Thank you


回答1:


The vector dataset format for LIBSVM is defined as

label feature_id1:feature_value1 feature_id2:feature_value2 ...

Thus, every feature (or value) needs is own unique identifier.

Example:

Imagine you have three different class labels 1,2,3 and a feature set consisting of a(id=1),b(id=2),c=(id=3), which was obtained via feature selection mechanism.

So let's say, that we have three datapoints d1,d2,d3, we want to describe in our dataset, it would be for example:

2 1:0.5325 3:0.523

3 2:0.7853 3:0.6326

1 1:0.53265 2:0.5422

Meaning:

  • d1 contains feature a(id=1) and c(id=3)
  • d2 contains feature b(id=2) and c(id=3)
  • d3 contains feature a(id=1) and b(id=2)

Note, that it is not necessary to provide feature_id1:feature_value1 for features, which are not contained in the given datapoint.



来源:https://stackoverflow.com/questions/39360296/how-to-create-source-data-file-for-training-and-testing-file-in-libsvm

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!