Here is some sample data on my problem:
mydf <- data.frame(A = rnorm(20, 1, 5),
B = rnorm(20, 2, 5),
C = rnorm(20, 3, 5),
D = rnorm(20, 4, 5),
E = rnorm(20, 5, 5))
Now I'd like to run a one-sample t-test on each column of the data.frame, to prove if it differs significantly from zero, like t.test(mydf$A)
, and then store the mean of each column, the t-value and the p-value in a new data.frame. So the result should look something like this:
A B C D E
mean x x x x x
t x x x x x
p x x x x x
I could definitely think of some tedious ways to do this, like looping through mydf
, calculating the parameters, and then looping through the new data.frame and insert the values.
But with packages like plyr
at hand, shouldn't there be a more concise and elegant way to do this?
Any ideas are highly appreciated.
Try something like this and then extract the results you want from the resulting table:
results <- lapply(mydf, t.test)
resultsmatrix <- do.call(cbind, results)
resultsmatrix[c("statistic","estimate","p.value"),]
Gives you:
A B C D E
statistic 1.401338 2.762266 5.406704 3.409422 5.024222
estimate 1.677863 2.936304 5.418812 4.231458 5.577681
p.value 0.1772363 0.01240057 3.231568e-05 0.002941106 7.531614e-05
a data.table
solution :
library(data.table)
DT <- as.data.table(mydf)
DT[,lapply(.SD,function(x){
y <- t.test(x)
list(p = round(y$p.value,2),
h = round(y$conf.int,2),
mm = round(y$estimate,2))})]
A B C D E
1: 0.2 0.42 0.01 0 0
2: -0.91,3.98 -1.15,2.62 1.19,6.15 2.82,6.33 2.68,6.46
3: 1.54 0.74 3.67 4.57 4.57
来源:https://stackoverflow.com/questions/17384282/compute-one-sample-t-test-for-each-column-of-a-data-frame-and-summarize-results