R - convert a data frame to a data set formatted as featureName:featureValue [duplicate]

旧时模样 提交于 2019-12-31 00:49:14

问题


It turns out the format I wanted is called "SVM-Light" and is described here http://svmlight.joachims.org/.


I have a data frame that I would like to convert to a text file with format as follows:

output featureIndex:featureValue ... featureIndex:featureValue 

So for example:

t = structure(list(feature1 = c(3.28, 6.88), feature2 = c(0.61, 1.83
), output = c("1", "-1")), .Names = c("feature1", "feature2", 
"output"), row.names = c(NA, -2L), class = "data.frame")

t
#   feature1 feature2 output
# 1     3.28     0.61      1
# 2     6.88     1.83     -1

would become:

1 feature1:3.28 feature2:0.61
-1 feature1:6.88 feature2:1.83

My code so far:

nvars = 2
l = array("row", nrow(t))
for(i in(1:nrow(t)))
{
    l = t$output[i]

    for(n in (1:nvars))
    {
        thisFeatureString = paste(names(t)[n], t[[names(t)[n]]][i], sep=":")
        l[i] = paste(l[i], thisFeatureString)
    }
}

but I am not sure how to complete and write the results to a text file. Also the code is probably not efficient.

Is there a library function that does this? as this kind of output format seems common for Vowpal Wabbit for example.


回答1:


I couln't find a ready-made solution, although the svm-light data format seems to be widely used.

Here is a working solution (at least in my case):

############### CONVERT DATA TO SVM-LIGHT FORMAT ##################################
# data_frame MUST have a column 'target'
# target values are assumed to be -1 or 1
# all other columns are treated as features
###################################################################################
ConvertDataFrameTo_SVM_LIGHT_Format <- function(data_frame)
{
    l = array("row", nrow(data_frame)) # l for "lines"
    for(i in(1:nrow(data_frame)))
    {
        # we start each line with the target value
        l[i] = data_frame$target[i]

        # then append to the line each feature index (which is n) and its 
        # feature value (data_frame[[names(data_frame)[n]]][i])
        for(n in (1:nvars))
        {
            thisFeatureString = paste(n, data_frame[[names(data_frame)[n]]][i], sep=":")
            l[i] = paste(l[i], thisFeatureString)
        }
    }

    return (l)
}
###################################################################################



回答2:


If you don't mind not having the column names in the output, I think you could use a simple apply to do that:

apply(t, 1, function(x) paste(x, collapse=" "))
#[1] "3.28 0.61 1"  "6.88 1.83 -1"

And to adjust the order of appearance in the output to your function's output you could do:

apply(t[c(3, 1, 2)], 1, function(x) paste(x, collapse=" "))
#[1] "1 3.28 0.61"  "-1 6.88 1.83"


来源:https://stackoverflow.com/questions/24142467/r-convert-a-data-frame-to-a-data-set-formatted-as-featurenamefeaturevalue

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!