What are the advantages of placing data in a new.env in r?

有些话、适合烂在心里 提交于 2019-12-10 00:12:42

问题


What are the advantages of placing data in a new .env in R?-speed, etc.

For data such as time series, is an new .env analogous to a database?

My question spans initally from downloading asset prices in R where it was suggested to place them into a new .env. Why is this so? Thank you:

library(TTR)

url = paste('http://www.nasdaq.com/markets/indices/nasdaq-100.aspx',sep="")
 txt = join(readLines(url)) 

 # extract tables from this pages
 temp = extract.table.from.webpage(txt, 'Symbol', hasHeader = T)
 temp[,2]

 # Symbols
 symbols = c(temp[,2])[2:101]

 currency("USD")
stock(symbols, currency = "USD", multiplier = 1)

# create new environment to store symbols
symEnv <- new.env()

# getSymbols and assign the symbols to the symEnv environment
getSymbols(symbols, from = '2002-09-01', to = '2013-10-17', env = symEnv)

回答1:


There are advantages to this if your data is large and you have to modify it by passing it through functions. When you send data.frames or vectors to functions that modify them, R will make a copy of the data before making changes to it. You'd then return the modified data from the function and overwrite the old data to complete the modification step.

If your data is large, copying the data for each function call may result in an undesirable amount of overhead. Using environments provides a way around this overhead. environments are handled differently by functions. If you pass an environment to a function and modify the contents, R will operate directly on the environment without making a copy of it. So by putting your data in an environment and passing the environment to the function instead of directly passing the data, you can avoid copying the large dataset.

# here I create a data.frame inside an environment and pass the environment
# to a function that modifies the data.
e <- new.env()
e$k <- data.frame(a=1:3)
f <- function(e) {e$k[1,1] <- 10}
f(e)
# you can see that the original data was changed.
e$k
   a
1 10
2  2
3  3

# alternatively, if I pass just the data.frame, the manipulations do not affect the 
# original data.
k <- data.frame(a=1:3)
f2 <- function(k) {k[1,1] <- 10}
f2(k)
k
  a
1 1
2 2
3 3



回答2:


Lets compare two cases. With new environment:

e <- new.env()
e$k <- data.frame(a=1:1000000)
f <- function(e) {e$k[1,1] <- 10}
system.time({
    for(i in 1:1000) f(e)
})
head(e$k) 

  user  system elapsed 
  5.32    6.35   11.67 

Without new environment:

k <- data.frame(a=1:1000000)
f <- function(e) {e[1,1] <- 10;return(e);}
system.time({
    for(i in 1:1000) k <- f(k)
}) 
  user  system elapsed 
  5.07    6.82   11.89

not much of a difference...



来源:https://stackoverflow.com/questions/19772091/what-are-the-advantages-of-placing-data-in-a-new-env-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!