How to understand the format type of libsvm of Spark MLlib?

依然范特西╮ 提交于 2019-12-01 09:30:40

The LibSVM format is quite simple. The first row contains the class label, in this case 0 or 1. Following that are the features, here there are two values for each one; the first one is the feature index (i.e. which feature it is) and the second one is the actual value.

The feature indices starts from 1 (there is no index 0) and are in ascending order. The indices not present on a row are 0.

In summary, each row looks like this;

<label> <index1>:<value1> <index2>:<value2> ... <indexN>:<valueN>

This format is advantageous to use when the data is sparse and contain lots of zeroes. All 0 values are not saved which will make the files both smaller and easier to read.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!