How do I read/write libsvm data into/from R?
The libsvm format is sparse data like
[
I have been running a job using the zygmuntz solution on a dataset with 25k observations (rows) for almost 5 hrs now. It has done 3k-ish rows. It was taking so long that I coded this up in the meantime (based on zygmuntz's code):
require(Matrix)
read.libsvm = function( filename ) {
content = readLines( filename )
num_lines = length( content )
tomakemat = cbind(1:num_lines, -1, substr(content,1,1))
# loop over lines
makemat = rbind(tomakemat,
do.call(rbind,
lapply(1:num_lines, function(i){
# split by spaces, remove lines
line = as.vector( strsplit( content[i], ' ' )[[1]])
cbind(i, t(simplify2array(strsplit(line[-1],
':'))))
})))
class(makemat) = "numeric"
#browser()
yx = sparseMatrix(i = makemat[,1],
j = makemat[,2]+2,
x = makemat[,3])
return( yx )
}
This ran in minutes on the same machine (there may have been memory issues with zygmuntz solution too, not sure). Hope this helps anyone with the same problem.
Remember, if you need to do big computations in R, VECTORIZE!
EDIT: fixed an indexing error I found this morning.