How to load data quickly into R?

前端 未结 4 1758
天命终不由人
天命终不由人 2020-12-02 07:23

I have some R scripts, where I have to load several dataframe in R as quickly as possible. This is quite important as reading the data is the slowest part of the procedure.

4条回答
  •  被撕碎了的回忆
    2020-12-02 08:16

    I am pretty happy with RMySQL. I am not sure whether I got your question the right way, but labels should not be a problem. There are several convenience functions that just use the default SQL table and row names, but of course you can use some SQL statements.

    I would say (apart from large datasets that justify the hustle) one of the main reasons to use RMySQL is being familiar more familiar with SQL syntax than with R data juggling functions. Personally I prefer GROUP BY over aggregate. Note, that using stored procedures from inside R does not work particularly well.

    Bottom line... setting up an MySQL localhost is not too much effort – give it a try! I cannot tell exactly about the speed, but I have the feeling there's a chance it's faster. However, I will try and get back here.

    EDIT: here's the test... and the winner is: spacedman

    # SQL connection
    source("lib/connect.R")
    
    dbQuery <- "SELECT * FROM mytable"
    mydata <- dbGetQuery(con,dbQuery)
    system.time(dbGetQuery(con,dbQuery))
    # returns
    #user  system elapsed 
    # 0.999   0.213   1.715 
    
    save.image(file="speedtest.Rdata")
    system.time(load("speedtest.Rdata"))
    #user  system elapsed 
    #0.348   0.006   0.358 
    

    File Size was only about 1 MB here. MacBook Pro 4 GB Ram 2.4 GHZ Intel Core Duo, Mac OSX 10.6.4, MySQL 5.0.41 Just never tried that, because I work with bigger dataset usually and loading is not the issue, rather processing... if there are time issues at all. +1 for the Q!

提交回复
热议问题