Complete.obs of cor() function

梦想与她 提交于 2019-12-04 18:56:43

问题


I am establishing a correlation matrix for my data, which looks like this

df <- structure(list(V1 = c(56, 123, 546, 26, 62, 6, NA, NA, NA, 15
), V2 = c(21, 231, 5, 5, 32, NA, 1, 231, 5, 200), V3 = c(NA, 
NA, 24, 51, 53, 231, NA, 153, 6, 700), V4 = c(2, 10, NA, 20, 
56, 1, 1, 53, 40, 5000)), .Names = c("V1", "V2", "V3", "V4"), row.names = c(NA, 
10L), class = "data.frame")

This gives the following data frame:

        V1  V2  V3   V4
    1   56  21  NA    2
    2  123 231  NA   10
    3  546   5  24   NA
    4   26   5  51   20
    5   62  32  53   56
    6    6  NA 231    1
    7   NA   1  NA    1
    8   NA 231 153   53
    9   NA   5   6   40
    10  15 200 700 5000

I normally use a complete.obs command to establish my correlation matrix using this command

crm <- cor(df, use="complete.obs", method="pearson") 

My question here is, how does the complete.obs treat the data? does it omit any row having a "NA" value, make a "NA" free table and make a correlation matrix at once like this?

df2 <- structure(list(V1 = c(26, 62, 15), V2 = c(5, 32, 200), V3 = c(51, 
53, 700), V4 = c(20, 56, 5000)), .Names = c("V1", "V2", "V3", 
"V4"), row.names = c(NA, 3L), class = "data.frame")

or does it omit "NA" values in a pairwise fashion, for example when calculating correlation between V1 and V2, the row that contains an NA value in V3, (such as rows 1 and 2 in my example) do they get omitted too?

If this is the case, I am looking forward to establish a command that reserves as much as possible of the data, by omitting NA values in a pairwise fashion.

Many thanks,


回答1:


Look at the help file for cor, i.e. ?cor. In particular,

If ‘use’ is ‘"everything"’, ‘NA’s will propagate conceptually, i.e., a resulting value will be ‘NA’ whenever one of its contributing observations is ‘NA’.

If ‘use’ is ‘"all.obs"’, then the presence of missing observations will produce an error. If ‘use’ is ‘"complete.obs"’ then missing values are handled by casewise deletion (and if there are no complete cases, that gives an error).

To get a better feel about what is going on, is to create an (even) simpler example:

df1 = df[1:5,1:3]
cor(df1, use="pairwise.complete.obs", method="pearson") 
cor(df1, use="complete.obs", method="pearson") 
cor(df1[3:5,], method="pearson") 

So, when we use complete.obs, we discard the entire row if an NA is present. In my example, this means we discard rows 1 and 2. However, pairwise.complete.obs uses the non-NA values when calculating the correlation between V1 and V2.



来源:https://stackoverflow.com/questions/18892051/complete-obs-of-cor-function

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!