read.csv is extremely slow in reading csv files with large numbers of columns

后端 未结 5 618
栀梦
栀梦 2020-12-08 11:29

I have a .csv file: example.csv with 8000 columns x 40000 rows. The csv file have a string header for each column. All fields contains integer values between 0 and 10. When

5条回答
  •  春和景丽
    2020-12-08 12:05

    If you'll read the file often, it might well be worth saving it from R in a binary format using the save function. Specifying compress=FALSE often results in faster load times.

    ...You can then load it in with the (surprise!) load function.

    d <- as.data.frame(matrix(1:1e6,ncol=1000))
    write.csv(d, "c:/foo.csv", row.names=FALSE)
    
    # Load file with read.csv
    system.time( a <- read.csv("c:/foo.csv") ) # 3.18 sec
    
    # Load file using scan
    system.time( b <- matrix(scan("c:/foo.csv", 0L, skip=1, sep=','), 
                             ncol=1000, byrow=TRUE) ) # 0.55 sec
    
    # Load (binary) file using load
    save(d, file="c:/foo.bin", compress=FALSE)
    system.time( load("c:/foo.bin") ) # 0.09 sec
    

提交回复
热议问题