read/write data in libsvm format

前端 未结 7 2081
慢半拍i
慢半拍i 2020-11-30 11:05

How do I read/write libsvm data into/from R?

The libsvm format is sparse data like

[ 

        
7条回答
  •  借酒劲吻你
    2020-11-30 11:46

    I have been running a job using the zygmuntz solution on a dataset with 25k observations (rows) for almost 5 hrs now. It has done 3k-ish rows. It was taking so long that I coded this up in the meantime (based on zygmuntz's code):

    require(Matrix)
    read.libsvm = function( filename ) {
      content = readLines( filename )
      num_lines = length( content )
      tomakemat = cbind(1:num_lines, -1, substr(content,1,1))
    
      # loop over lines
      makemat = rbind(tomakemat,
      do.call(rbind, 
        lapply(1:num_lines, function(i){
           # split by spaces, remove lines
               line = as.vector( strsplit( content[i], ' ' )[[1]])
               cbind(i, t(simplify2array(strsplit(line[-1],
                              ':'))))   
    })))
    class(makemat) = "numeric"
    
    #browser()
    yx = sparseMatrix(i = makemat[,1], 
                  j = makemat[,2]+2, 
              x = makemat[,3])
    return( yx )
    }
    

    This ran in minutes on the same machine (there may have been memory issues with zygmuntz solution too, not sure). Hope this helps anyone with the same problem.

    Remember, if you need to do big computations in R, VECTORIZE!

    EDIT: fixed an indexing error I found this morning.

提交回复
热议问题