Summary statistics using ddply

杀马特。学长 韩版系。学妹 提交于 2019-12-21 04:58:42

问题


I like to write a function using ddply that outputs the summary statistics based on the name of two columns of data.frame mat.

  • mat is a big data.frame with the name of columns "metric", "length", "species", "tree", ...,"index"

  • index is factor with 2 levels "Short", "Long"

  • "metric", "length", "species", "tree" and others are all continuous variables

Function:

summary1 <- function(arg1,arg2) {
    ...

    ss <- ddply(mat, .(index), function(X) data.frame(
        arg1 = as.list(summary(X$arg1)),
        arg2 = as.list(summary(X$arg2)),
        .parallel = FALSE)

    ss
}

I expect the output to look like this after calling summary1("metric","length")

Short metric.Min. metric.1st.Qu. metric.Median metric.Mean metric.3rd.Qu. metric.Max. length.Min. length.1st.Qu. length
.Median length.Mean length.3rd.Qu. length.Max. 

....

Long metric.Min. metric.1st.Qu. metric.Median metric.Mean metric.3rd.Qu. metric.Max. length.Min. length.1st.Qu. length
.Median length.Mean length.3rd.Qu. length.Max.

....

At the moment the function does not produce the desired output? What modification should be made here?

Thanks for your help.


Here is a toy example

mat <- data.frame(
    metric = rpois(10,10), length = rpois(10,10), species = rpois(10,10),
    tree = rpois(10,10), index = c(rep("Short",5),rep("Long",5))
)

回答1:


As Nick wrote in his answer you can't use $ to reference variable passed as character name. When you wrote X$arg1 then R search for column named "arg1" in data.frame X. You can reference to it either by X[,arg1] or X[[arg1]].

And if you want nicely named output I propose below solution:

summary1 <- function(arg1, arg2) {

    ss <- ddply(mat, .(index), function(X) data.frame(
        setNames(
            list(as.list(summary(X[[arg1]])), as.list(summary(X[[arg2]]))),
            c(arg1,arg2)
            )), .parallel = FALSE)

    ss
}
summary1("metric","length")

Output for toy data is:

  index metric.Min. metric.1st.Qu. metric.Median metric.Mean metric.3rd.Qu.
1  Long           5              7            10         8.6             10
2 Short           7              7             9         8.8             10
  metric.Max. length.Min. length.1st.Qu. length.Median length.Mean length.3rd.Qu.
1          11           9             10            11        10.8             12
2          11           4              9             9         9.0             11
  length.Max.
1          12
2          12



回答2:


Is this more like what you want?

summary1 <- function(arg1,arg2) {
ss <- ddply(mat, .(index), function(X){ data.frame(
    arg1 = as.list(summary(X[,arg1])),
    arg2 = as.list(summary(X[,arg2])),
    .parallel = FALSE)})
ss
}


来源:https://stackoverflow.com/questions/5714658/summary-statistics-using-ddply

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!