Do you use attach() or call variables by name or slicing?

我的梦境 提交于 2019-12-17 06:36:46

问题


Many intro R books and guides start off with the practice of attaching a data.frame so that you can call the variables by name. I have always found it favorable to call variables with $ notation or square bracket slicing [,2]. That way I can use multiple data.frames without confusing them and/or use iteration to successively call columns of interest. I noticed Google recently posted coding guidelines for R which included the line

1) attach: avoid using it

How do people feel about this practice?


回答1:


I never use attach. with and within are your friends.

Example code:

> N <- 3
> df <- data.frame(x1=rnorm(N),x2=runif(N))
> df$y <- with(df,{
   x1+x2
 })
> df
          x1         x2          y
1 -0.8943125 0.24298534 -0.6513271
2 -0.9384312 0.01460008 -0.9238312
3 -0.7159518 0.34618060 -0.3697712
> 
> df <- within(df,{
   x1.sq <- x1^2
   x2.sq <- x2^2
   y <- x1.sq+x2.sq
   x1 <- x2 <- NULL
 })
> df
          y        x2.sq     x1.sq
1 0.8588367 0.0590418774 0.7997948
2 0.8808663 0.0002131623 0.8806532
3 0.6324280 0.1198410071 0.5125870

Edit: hadley mentions transform in the comments. here is some code:

 > transform(df, xtot=x1.sq+x2.sq, y=NULL)
       x2.sq       x1.sq       xtot
1 0.41557079 0.021393571 0.43696436
2 0.57716487 0.266325959 0.84349083
3 0.04935442 0.004226069 0.05358049



回答2:


I much prefer to use with to obtain the equivalent of attach on a single command:

 with(someDataFrame,  someFunction(...))

This also leads naturally to a form where subset is the first argument:

 with(subset(someDataFrame,  someVar > someValue),
      someFunction(...))

which makes it pretty clear that we operate on a selection of the data. And while many modelling function have both data and subset arguments, the use above is more consistent as it also applies to those functions who do not have data and subset arguments.




回答3:


The main problem with attach is that it can result in unwanted behaviour. Suppose you have an object with name xyz in your workspace. Now you attach dataframe abc which has a column named xyz. If your code reference to xyz, can you guarantee that is references to the object or the dataframe column? If you don't use attach then it is easy. just xyz refers to the object. abc$xyz refers to the column of the dataframe.

One of the main reasons that attach is used frequently in textbooks is that it shortens the code.




回答4:


"Attach" is an evil temptation. The only place where it works well is in the classroom setting where one is given a single dataframe and expected to write lines of code to do the analysis on that one dataframe. The user is unlikely to ever use that data again once the assignement is done and handed in.

However, in the real world, more data frames can be added to the collection of data in a particular project. Furthermore one often copies and pastes blocks of code to be used for something similar. Often one is borrowing from something one did a few months ago and cannot remember the nuances of what was being called from where. In these circumstances one gets drowned by the previous use of "attach."




回答5:


Just like Leoni said, with and within are perfect substitutes for attach, but I wouldn't completely dismiss it. I use it sometimes, when I'm working directly at the R prompt and want to test some commands before writing them on a script. Especially when testing multiple commands, attach can be a more interesting, convenient and even harmless alternative to with and within, since after you run attach, the command prompt is clear for you to write inputs and see outputs.

Just make sure to detach your data after you're done!




回答6:


I prefer not to use attach(), as it is far too easy to run a batch of code several times each time calling attach(). The data frame is added to the search path each time, extending it unnecessarily. Of course, good programming practice is to also detach() at the end of the block of code, but that is often forgotten.

Instead, I use xxx$y or xxx[,"y"]. It's more transparent.

Another possibility is to use the data argument available in many functions which allows individual variables to be referenced within the data frame. e.g., lm(z ~ y, data=xxx).




回答7:


While I, too, prefer not to use attach(), it does have its place when you need to persist an object (in this case, a data.frame) through the life of your program when you have several functions using it. Instead of passing the object into every R function that uses it, I think it is more convenient to keep it in one place and call its elements as needed.

That said, I would only use it if I know how much memory I have available and only if I make sure that I detach() this data.frame once it is out of scope.

Am I making sense?



来源:https://stackoverflow.com/questions/1310247/do-you-use-attach-or-call-variables-by-name-or-slicing

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!