Different results for 2 subset data methods in R

半世苍凉 提交于 2019-12-13 08:42:36

问题


I'm subseting my data, and I'm getting different results for the following codes:

subset(df, x==1)
df[df$x==1,]

x's type is integer

Am I doing something wrong? Thank you in advance


回答1:


Without example data, it is difficult to say what your problem is. However, my hunch is that the following probably explains your problem:

df <- data.frame(quantity=c(1:3, NA), item=c("Coffee", "Americano", "Espresso", "Decaf"))
df
quantity      item
       1    Coffee
       2 Americano
       3  Espresso
      NA     Decaf

Let's subset with [

df[df$quantity == 2,]
 quantity      item
        2 Americano
       NA      <NA>

Now let's subset with subset:

subset(df, quantity == 2)
quantity      item
       2 Americano

We see that there is a difference in sub-setting output depending on how NA values are treated. I think of this as follows: With subset, you are explicitly stating you want the subset for which the condition is verifiably true. df$quantity==2 produces a vector of true/false-statements, but where quantity is missing, it is impossible to assign TRUE or FALSE. This is why we get the following output with an NA at the end:

df$quantity==2
[1] FALSE  TRUE FALSE    NA

The function [ takes this vector but does not understand what to do with NA, which is why instead of NA Decaf we get NA <NA>. If you prefer using [, you could use the following instead:

df[which(df$quantity == 2),]
quantity      item
       2 Americano

This translates the logical condition df$quantity == 2 into a vector or row numbers where the logical condition is "verifiably" satisfied.



来源:https://stackoverflow.com/questions/43782875/different-results-for-2-subset-data-methods-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!