R ttest inside ddply gives error “grouping factor must have exactly 2 levels”

孤街浪徒 提交于 2019-12-10 09:40:59

问题


I have a dataframe with several factors and two phenotypes

freq sampleID status score snpsincluded
0.5 0001 case 100 all 
0.2 0001 case 30 all 
0.5 0002 control 110 all 
0.5 0003 case 100 del 
etc

I would like to do a t.test comparing cases and controls for each set of relevant factors. I have tried the following:

o2 <- ddply(df, c("freq","snpsincluded"), summarise, pval=t.test(score, status)$p.value)

but it complains that " grouping factor must have exactly 2 levels"

I have no missing values, NAs, and Ive checked:

levels(df$status)
[1] "case"    "control"

Am I missing something stupid? Thanks!


回答1:


You get an error because , you get a for at least one sub-group , unique status value for all score's.

This reproduce the error, the status is unique (equal to 1) for all scores.

dx = read.table(text='   score status
1 1 1 
2 2 1 
3 3 1 ')

t.test(score ~ status, data = dx) 
Error in t.test.formula(score ~ status, data = dx) : 
  grouping factor must have exactly 2 levels

this correct the problem but create another known problem with t.test, you should have enough observations( I think >= 2):

dx = read.table(text='   score status
1 1 1 
2 2 1 
3 3 2 ')

t.test(score ~ status, data = dx) 
Error in t.test.default(x = 1:2, y = 3L) : not enough 'y' observations

Finally this correct all the problems:

dx = read.table(text='   score status
1 1 1 
2 2 1 
3 3 2 
4 4 2')

t.test(score ~ status, data = dx) 

Welch Two Sample t-test

data:  score by status
t = -2.8284, df = 2, p-value = 0.1056
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -5.042435  1.042435
sample estimates:
mean in group 1 mean in group 2 
            1.5             3.5 

EDIT I explain the problem without giving a solution, because you don't give a reproducible example.

one solution is do computation only for good groups:

  ddply(df, c("freq","snpsincluded"), function(x)
      { 
       if(length(unique(x$status)==2)
         pval=t.test(score~status,data=x)$p.value
     })


来源:https://stackoverflow.com/questions/21148658/r-ttest-inside-ddply-gives-error-grouping-factor-must-have-exactly-2-levels

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!