Subsetting data frame using variable with same name as column

北城余情 提交于 2019-12-11 01:48:28

问题


I have a data frame and I'm trying to run a subset on it. In my data frame, I have a column called "start" and I'm trying to do this:

sub <- subset(data,data$start==14)

and I correctly get a subset of all the rows where start=14.

But, when I do this:

for(start in seq(1,20,by=1)) {
   sub <- subset(data,data$start==start)
   print(sub)
}

it does not correctly find the subsets. It just prints the entire data frame.

Why is this and how do I fix it?


回答1:


You can also specify the environment you're working with:

x<-data.frame(
  start=sample(3,20,replace=TRUE),
  someValue=runif(20))

env<-environment()
start<-3
cat("\nDefaut scope:")
print(subset(x,start==start)) # all entries, as start==start is evaluated to TRUE

cat("\nSpecific environment:")
print(subset(x,start==get('start',env)))  # second start is replaced by its value in former environment. Equivalent to subset(x,start==3)



回答2:


Fixing it is easy. Just rename either your for loop counter or your data frame column to something other than start.

The reason it happens is because subset is trying to evaluate the expression data$start == start inside the data frame data. So it sees the column start and stops there, never seeing the other variable start you defined in the for loop.

Perhaps a better insight into why R gets confused here is to note that when using subset you don't in general need to refer to variables using data$. So imagine telling R:

subset(data,start == start)

R is just going to evaluate both of those start's inside data and get a vector of all TRUE's back.




回答3:


Another approach is to use bracket subsetting rather than the subset function.

for(start in seq(1,20,by=1)) {
   sub <- data[data$start==start,]
   print(sub)
}

subset has non-standard evaluation rules, which is leading to the scoping problem you are seeing (to which start are you referring?). If there are (or may be) NA's in data$start, you probably need

sub <- data[!is.na(data$start) & data$start==start,]

Note this warning from the subset help page:

This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.



来源:https://stackoverflow.com/questions/7572400/subsetting-data-frame-using-variable-with-same-name-as-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!