Subsetting data frame using variable with same name as column

问题

I have a data frame and I'm trying to run a subset on it. In my data frame, I have a column called "start" and I'm trying to do this:

sub <- subset(data,data$start==14)

and I correctly get a subset of all the rows where start=14.

But, when I do this:

for(start in seq(1,20,by=1)) {
   sub <- subset(data,data$start==start)
   print(sub)
}

it does not correctly find the subsets. It just prints the entire data frame.

Why is this and how do I fix it?

回答1:

You can also specify the environment you're working with:

x<-data.frame(
  start=sample(3,20,replace=TRUE),
  someValue=runif(20))

env<-environment()
start<-3
cat("\nDefaut scope:")
print(subset(x,start==start)) # all entries, as start==start is evaluated to TRUE

cat("\nSpecific environment:")
print(subset(x,start==get('start',env)))  # second start is replaced by its value in former environment. Equivalent to subset(x,start==3)

回答2:

Fixing it is easy. Just rename either your for loop counter or your data frame column to something other than start.

The reason it happens is because subset is trying to evaluate the expression data$start == start inside the data frame data. So it sees the column start and stops there, never seeing the other variable start you defined in the for loop.

Perhaps a better insight into why R gets confused here is to note that when using subset you don't in general need to refer to variables using data$. So imagine telling R:

subset(data,start == start)

R is just going to evaluate both of those start's inside data and get a vector of all TRUE's back.

回答3:

Another approach is to use bracket subsetting rather than the subset function.

for(start in seq(1,20,by=1)) {
   sub <- data[data$start==start,]
   print(sub)
}

subset has non-standard evaluation rules, which is leading to the scoping problem you are seeing (to which start are you referring?). If there are (or may be) NA's in data$start, you probably need

sub <- data[!is.na(data$start) & data$start==start,]

Note this warning from the subset help page:

This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.

来源：https://stackoverflow.com/questions/7572400/subsetting-data-frame-using-variable-with-same-name-as-column

标签

for-loop

scope

dataframe

subset