Using Kolmogorov Smirnov Test in R

做~自己de王妃 提交于 2019-12-30 10:31:43

问题


I designed 3000 experiments, so that in one experiment there are 4 groups (treatment), in each group there are 50 individuals (subjects). For each experiment I do a standard one way ANOVA and proof if their p.values has a uni probability function under the null-hypothesis, but ks.test rejects this assumption and I cant see why?

subject<-50
treatment<-4
experiment<-list()
R<-3000
seed<-split(1:(R*subject),1:R)
for(i in 1:R){
  e<-c()
  for(j in 1:subject){
    set.seed(seed[[i]][j]) 
    e<-c(e,rmvnorm(mean=rep(0,treatment),sigma=diag(3,4),n=1,method="chol"))
   }
  experiment<-c(experiment,list(matrix(e,subject,treatment,byrow=T)))
 }

 p.values<-c()
for(e in experiment){
  d<-data.frame(response=c(e),treatment=factor(rep(1:treatment,each=subject)))
  p.values<-c(p.values,anova(lm(response~treatment,d))[1,"Pr(>F)"])
 }

 ks.test(p.values, punif,alternative = "two.sided")

回答1:


I commented out the lines in your code that change the random seed, and got a P-value of .34. That was with an unknown seed, so for reproducibility, I did set.seed(1) and ran it again. This time, I got a P-value of 0.98.

As to why this makes a difference, I'm not an expert in PRNGs, but any decent generator will ensure successive draws are statistically independent for all practical purposes. The best ones will ensure the same for greater lags, eg the Mersenne Twister which is R's default PRNG guarantees it for lags up to 623 (IIRC). In fact, meddling with the seed is likely to impair the statistical properties of the draws.

Your code is also doing things in a really inefficient way. You're creating a list for the experiments, and adding one item for each experiment. Within each experiment, you also create a matrix, and add a row for each observation. Then you do something very similar for the P-values. I'll see if I can fix that up.

This is how I'd replace your code. Strictly speaking I could make it even tighter, by avoiding formulas, creating the bare model matrix, and calling lm.fit directly. But that would mean having to manually code up the ANOVA test rather than simply calling anova, which is more trouble than it's worth.

set.seed(1) # or any other number you like

x <- factor(rep(seq_len(treatment), each=subject))
p.values <- sapply(seq_len(R), function(r) {
    y <- rnorm(subject * treatment, s=3)
    anova(lm(y ~ x))[1,"Pr(>F)"]
})
ks.test(p.values, punif,alternative = "two.sided")


        One-sample Kolmogorov-Smirnov test

data:  p.values
D = 0.0121, p-value = 0.772
alternative hypothesis: two-sided


来源:https://stackoverflow.com/questions/17381214/using-kolmogorov-smirnov-test-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!