问题
I'm trying to create an lapply
function to run multiple t.test
s for multiple levels of grouping. I came across this question: Kruskal-Wallis test: create lapply function to subset data.frame? but they were only trying to group by one variable (phase
). I would like to add another grouping level color
, where my iv is distance
and dv is val
grouped by color
then phase
.
# create data
val<-runif(60, min = 0, max = 100)
distance<-floor(runif(60, min=1, max=3))
phase<-rep(c("a", "b", "c"), 20)
color<-rep(c("red", "blue","green","yellow","purple"), 12)
df<-data.frame(val, distance, phase, color)
Their answer for the grouping by phase
was
lapply(split(df, df$phase), function(d) { kruskal.test(val ~ distance, data=d) })
However, it doesn't account for another level (color
) for grouping. I might be approaching this wrong so I appreciate any help.
回答1:
Simply incorporate a list()
inside split
on needed column(s). However, with your sample this will create an error since some groups all share same distance values.
lapply(split(df, list(df$color, df$phase)), function(d) {
kruskal.test(val ~ distance, data=d)
})
Error in kruskal.test.default(c(76.6759299905971, 3.11371604911983, 17.6471394719556, : all observations are in the same group
Consequently, consider wrapping in tryCatch
to return NA
or any other object for those problem groups:
lapply(split(df, list(df$color, df$phase)), function(d) {
tryCatch({ kruskal.test(val ~ distance, data=d) },
error = function(e) NA)
})
By the way, consider by
(object-oriented wrapper to tapply
and often overlooked member of apply family) instead of nesting split
inside lapply
:
by(df, df[c("color", "phase")], function(d) {
tryCatch({ kruskal.test(val ~ distance, data=d) },
error = function(e) NA)
})
来源:https://stackoverflow.com/questions/55249057/t-test-create-lapply-function-for-multiple-grouping-levels