dataset limitation in R package “nparcomp”

社会主义新天地 提交于 2019-12-13 16:25:17

问题


I am using the R package nparcomp recently and I used it to test the significant difference of my response variable between the categories.

I found out that the nparcomp function can not deal with large size of data (number of rows>5000). For example, here is my code:

a<-nparcomp(oc20_kgm2~ decade, data=dat, asy.method = "mult.t",
            type = "Tukey",alternative = "two.sided", 
            plot.simci = TRUE, info = FALSE)

summary(a)

where, oc20_kgm2 is my response variable, decade is my factor (with 10 categories), dat is my dataset. My original dataset has about 15,000 rows/samples. If I run the code above, the error showed:

Error in checkmvArgs(lower = lower, upper = upper, mean = delta, corr = corr,  : 
  ‘lower’ not specified or contains NA
In addition: There were 49 warnings (use warnings() to see them)

So to diagnose, I have to randomly select 5,000 samples from my original dat. And then I run the same code above, it works. In addition, 5,500 samples or 10,000 samples don't work.

My question is, is there a limitation of sample size to run this function? And is there any other test function/package that I can use in R?


Revision after reading the comment:

traceback()

4: stop(sQuote("lower"), " not specified or contains NA")
3: checkmvArgs(lower = lower, upper = upper, mean = delta, corr = corr, 
       sigma = sigma)
2: pmvt(lower = -abs(T[pp]), abs(T[pp]), corr = rho.bf, df = df.sw, 
       delta = rep(0, nc))
1: nparcomp(oc20_kgm2 ~ decade, data = dat2, asy.method = "mult.t", 
       type = "Tukey", alternative = "two.sided", plot.simci = TRUE, 
       info = FALSE)

> warnings()
Warning messages:
1: In n[j] * n[w] * n[i] : NAs produced by integer overflow
2: In n[i] * n[w] * n[j] : NAs produced by integer overflow
3: In n[i] * n[v] * n[j] : NAs produced by integer overflow
4: In cov2cor(cov.bf) :
  diag(.) had 0 or NA entries; non-finite result is doubtful

回答1:


This error occurs because n, the size of each factor, is a list of integers and therefore vulnerable to integer overflow at large values. To fix it, modify the source code of nparcomp from

n <- sapply(samples, length)

to

n <- as.numeric(sapply(samples, length))

To view the source code, type nparcomp at an R prompt.



来源:https://stackoverflow.com/questions/24047659/dataset-limitation-in-r-package-nparcomp

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!