In R, What is the difference between df[“x”] and df$x

浪尽此生 提交于 2019-11-30 08:15:31

If I'm not mistaken, df$x is the same as df[['x']]. [[ is used to select any single element, whereas [ returns a list of the selected elements. See also the language reference. I usually see that [[ is used for lists, [ for arrays and $ for getting a single column or element. If you need an expression (for example df[[name]] or df[,name]), then use the [ or [[ notation also. The [ notation is also used if multiple columns are selected. For example df[,c('name1', 'name2')]. I don't think there is a best-practices for this.

Another difference is that df$w returns NULL and df['w'] or df[['w']] gives an error with your example dataframe.

jverzani

In addition to the indexing page in the manual, you can find this succinct description on the help page ?"$":

Indexing by ‘[’ is similar to atomic vectors and selects a list of the specified element(s).

Both ‘[[’ and ‘$’ select a single element of the list. The main difference is that ‘$’ does not allow computed indices, whereas ‘[[’ does. ‘x$name’ is equivalent to ‘x[["name", exact = FALSE]]’. Also, the partial matching behavior of ‘[[’ can be controlled using the ‘exact’ argument.

The function calls are, of course, different. See get("[.data.frame") versus get("[[.data.frame") versus get("$")

In this instance, for most uses, I'd avoid sub-setting altogether and trying to remember what $, [ and [[ do with a data frame. I would just use with():

> df <- data.frame(x = 1:20, y = letters[1:20], z = 20:1)
> with(df, y)
 [1] a b c d e f g h i j k l m n o p q r s t
Levels: a b c d e f g h i j k l m n o p q r s t

That is a lot clearer than any of the sub-setting methods in most cases (IMHO).

Sharpie

One thing I haven't seen explained explicitly is that [ and [[ can be used to select based on the value of a variable or expression while $ cannot. I.E you can do:

> example_frame <- data.frame(Var1 = c(1,2), Var2 = c('a', 'b'))
> x <- 'Var1'

> example_frame$x
NULL  # Not what you wanted

> example_frame[x]
  Var1
1    1
2    2

> example_frame[[x]]
[1] 1 2

> example_frame[[ paste(c("V","a","r",2), collapse='') ]]
[1] a b
Levels: a b

The differences between [ and [[ have been well covered by other posts and other questions.

If you use df[,"x"] instead of df["x"] you will get the same result as df$x. The comma indicates that you're selecting a column by name.

df$x and df[[x]] do the same thing.

Let's assume that you have a data set named one. One of these variables is a factor variable, Region. Using one$Region will allow you to select a specific variable. Consider the following:

one <- read.csv("IED.csv")
one$Region

Running the following code also allows you to isolate that variable/level.

one[["Region"]]

Each code produces the following output:

> one$Region
    [1] RC SOUTH      RC SOUTH      RC SOUTH      RC EAST       RC EAST      
    [6] RC EAST       RC EAST       RC EAST       RC EAST       RC EAST      
   [11] RC SOUTH      RC SOUTH      RC EAST       RC EAST       RC EAST      
   [16] RC EAST       RC EAST       RC SOUTH      RC SOUTH      RC EAST      
   [21] RC SOUTH      RC EAST       RC CAPITAL    RC EAST       RC EAST 


> one[["Region"]]
    [1] RC SOUTH      RC SOUTH      RC SOUTH      RC EAST       RC EAST      
    [6] RC EAST       RC EAST       RC EAST       RC EAST       RC EAST      
   [11] RC SOUTH      RC SOUTH      RC EAST       RC EAST       RC EAST      
   [16] RC EAST       RC EAST       RC SOUTH      RC SOUTH      RC EAST      
   [21] RC SOUTH      RC EAST       RC CAPITAL    RC EAST       RC EAST 

"They both return the "same" results, but not necessarily in the same format." - I didn't notice any differences. Each command produced the same outputs in the same format. Perhaps its your data.

Hope that helps.

EDIT:

Misread the original question. df["x"] produces the following:

> one["Region"]
             Region
1          RC SOUTH
2          RC SOUTH
3          RC SOUTH
4           RC EAST
5           RC EAST
6           RC EAST
7           RC EAST
8           RC EAST
9           RC EAST
10          RC EAST

Not sure why the difference occurs.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!