summarize data from csv using R

左心房为你撑大大i 提交于 2019-12-06 19:15:33
Chase

I can't manage to read your example data in, but I think I've made something that generally represents it...so give this a whirl. This answer builds off of Greg's suggestion to look at plyr and the functions ddply to group by segments of your data.frame and numcolwise to calculate your statistics of interest.

#Sample data
set.seed(1)
dat <- data.frame(sname = rep(letters[1:3],2), plot = rep(letters[1:3],2), 
                  CAP = rnorm(6), 
                  H = rlnorm(6), 
                  VOLUME = runif(6),
                  BASALAREA = rlnorm(6)
                  )


#Calculate mean for all numeric columns, grouping by sname and plot
library(plyr)
ddply(dat, c("sname", "plot"), numcolwise(mean))
#-----
  sname plot        CAP        H    VOLUME BASALAREA
1     a    a  0.4844135 1.182481 0.3248043  1.614668
2     b    b  0.2565755 3.313614 0.6279025  1.397490
3     c    c -0.8280485 1.627634 0.1768697  2.538273

EDIT - response to updated question

Ok - now that your question is more or less reproducible, here's how I'd approach it. First of all, you can take advantage of the fact that R is a vectorized meaning that you can calculate ALL of the values from VOLUME and BASALAREA in one pass, without looping through each row. For that bit, I recommend the transform function:

dat <- transform(dat, VOLUME = treeVolume(CAP, H), BASALAREA = treeBasalArea(CAP))

Secondly, realizing that you intend to calculate different statistics for CAP & H and then VOLUME & BASALAREA, I recommend using the summarize function, like this:

ddply(dat, c("sname", "plot"), summarize,
  meanCAP = mean(CAP),
  meanH = mean(H),
  sumVOLUME = sum(VOLUME),
  sumBASAL = sum(BASALAREA)
  )

Which will give you an output that looks like:

  sname plot   meanCAP     meanH    sumVOLUME     sumBASAL
1     a    a 0.5868582 0.5032308 9.650184e-06 7.031954e-05
2     b    b 0.2869029 0.4333862 9.219770e-06 1.407055e-05
3     c    c 0.7356215 0.4028354 2.482775e-05 8.916350e-05

The help pages for ?ddply, ?transform, ?summarize should be insightful.

Look at the plyr package. I will split the data by the SNAME variable for you, then you give it code to do the set of summaries that you want (mixing mean and sum and whatever), then it will put the pieces back together for you. You probably want either the 'ddply' or the 'daply' function in that package.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!