How to load data quickly into R?

前端未结

关注

 4  1758

天命终不由人 2020-12-02 07:23

I have some R scripts, where I have to load several dataframe in R as quickly as possible. This is quite important as reading the data is the slowest part of the procedure.

4条回答

被撕碎了的回忆 (楼主)

2020-12-02 08:16
I am pretty happy with RMySQL. I am not sure whether I got your question the right way, but labels should not be a problem. There are several convenience functions that just use the default SQL table and row names, but of course you can use some SQL statements.

I would say (apart from large datasets that justify the hustle) one of the main reasons to use RMySQL is being familiar more familiar with SQL syntax than with R data juggling functions. Personally I prefer GROUP BY over aggregate. Note, that using stored procedures from inside R does not work particularly well.

Bottom line... setting up an MySQL localhost is not too much effort – give it a try! I cannot tell exactly about the speed, but I have the feeling there's a chance it's faster. However, I will try and get back here.

EDIT: here's the test... and the winner is: spacedman
```
# SQL connection
source("lib/connect.R")

dbQuery <- "SELECT * FROM mytable"
mydata <- dbGetQuery(con,dbQuery)
system.time(dbGetQuery(con,dbQuery))
# returns
#user  system elapsed 
# 0.999   0.213   1.715 

save.image(file="speedtest.Rdata")
system.time(load("speedtest.Rdata"))
#user  system elapsed 
#0.348   0.006   0.358 
```
File Size was only about 1 MB here. MacBook Pro 4 GB Ram 2.4 GHZ Intel Core Duo, Mac OSX 10.6.4, MySQL 5.0.41 Just never tried that, because I work with bigger dataset usually and loading is not the issue, rather processing... if there are time issues at all. +1 for the Q!
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...