Geometric Mean: is there a built-in?

问题

I tried to find a built-in for geometric mean but couldn\'t.

(Obviously a built-in isn\'t going to save me any time while working in the shell, nor do I suspect there\'s any difference in accuracy; for scripts I try to use built-ins as often as possible, where the (cumulative) performance gain is often noticeable.

In case there isn\'t one (which I doubt is the case) here\'s mine.

gm_mean = function(a){prod(a)^(1/length(a))}

回答1:

Here is a vectorized, zero- and NA-tolerant function for calculating geometric mean in R. The verbose mean calculation involving length(x) is necessary for the cases where x contains non-positive values.

gm_mean = function(x, na.rm=TRUE){
  exp(sum(log(x[x > 0]), na.rm=na.rm) / length(x))
}

Thanks to @ben-bolker for noting the na.rm pass-through and @Gregor for making sure it works correctly.

I think some of the comments are related to a false-equivalency of NA values in the data and zeros. In the application I had in mind they are the same, but of course this is not generally true. Thus, if you want to include optional propagation of zeros, and treat the length(x) differently in the case of NA removal, the following is a slightly longer alternative to the function above.

gm_mean = function(x, na.rm=TRUE, zero.propagate = FALSE){
  if(any(x < 0, na.rm = TRUE)){
    return(NaN)
  }
  if(zero.propagate){
    if(any(x == 0, na.rm = TRUE)){
      return(0)
    }
    exp(mean(log(x), na.rm = na.rm))
  } else {
    exp(sum(log(x[x > 0]), na.rm=na.rm) / length(x))
  }
}

Note that it also checks for any negative values, and returns a more informative and appropriate NaN respecting that geometric mean is not defined for negative values (but is for zeros). Thanks to commenters who stayed on my case about this.

回答2:

No, but there are a few people who have written one, such as here.

Another possibility is to use this:

exp(mean(log(x)))

回答3:

you can use psych package and call geometric.mean function in that.

回答4:

The

exp(mean(log(x)))

will work unless there is a 0 in x. If so, the log will produce -Inf (-Infinite) which always results in a geometric mean of 0.

One solution is to remove the -Inf value before calculating the mean:

geo_mean <- function(data) {
    log_data <- log(data)
    gm <- exp(mean(log_data[is.finite(log_data)]))
    return(gm)
}

You can use a one-liner to do this but it means calculating the log twice which is inefficient.

exp(mean(log(i[is.finite(log(i))])))

回答5:

I use exactly what Mark says. This way, even with tapply, you can use the built-in mean function, no need to define yours! For example, to compute per-group geometric means of data$value:

exp(tapply(log(data$value), data$group, mean))

回答6:

In case there is missing values in your data, this is not a rare case. you need to add one more argument. You may try following codes.

exp(mean(log(i[is.finite(log(i))]),na.rm=T))

回答7:

The EnvStats package has a function for geoMean and geoSd

回答8:

Here is my version. It has the following features that set it apart from the currently accepted answer by Paul McMurdie:

When na.rm == TRUE, NA values are ignored in the denominator - hence the use of non-missing values count variable values.count in the denominator instead of length(x).
It optionally distinguishes between NaN and generic NA values, with a .rm parameter for each. By default, NaNs are "bad", just like negative numbers are bad, so NaN is returned. Having two parameters for handling missing values is obviously not ideal, but the way I set the defaults for these parameters and arranged the cases in the case_when statement should (hopefully) obviate the possibility of unexpected behavior.
My version includes another optional parameter eta that handles zeroes. eta defaults to NA_real_, in which case zeros are counted in the denominator but not propagated (analogous to the zero.propagate = FALSE optional parameter in the accepted answer). When a positive number is passed, eta functions as an artificial constant to be added to x (but only in the event that x contains zeroes). When any other number is passed (presumably 0), zeroes are propagated, just as when zero.propagate is set equal to TRUE in the accepted answer.

I'm sure tweaks may be called for (for instance, it may be best to add eta (given that eta is a positive number) regardless of whether there are or are not zeroes). I thought about even having the function dynamically choose a value for eta based on x but opted against adding any further complexity.

suppressMessages(library(dplyr))

geomean <- function(x, na.rm = TRUE, nan.rm = FALSE, eta = NA_real_) {
  nan.count <- is.nan(x) %>%
    sum()
  na.count <- is.na(x) %>%
    sum()
  value.count <- !is.na(x) %>%
    sum()
  case_when(
    #Handle cases when there are negative values, all values are missing, or
    #missing values are not tolerated.
    (nan.count > 0 & !nan.rm) | any(x < 0, na.rm = TRUE) ~ NaN,
    (na.count > 0 & !na.rm) | value.count == 0 ~ NA_real_,

    #Handle cases when non-missing values are either all positive or all zero.
    #In these cases the eta parameter is irrelevant and therefore ignored.
    all(x > 0, na.rm = TRUE) ~ exp(mean(log(x), na.rm = TRUE)),
    all(x == 0, na.rm = TRUE) ~ 0,

    #All remaining cases are cases when there are a mix of positive and zero values.
    #By default, we do not use an artificial constant or propagate zeros.
    is.na(eta) ~ exp(sum(log(x[x > 0]), na.rm = TRUE) / value.count),
    eta > 0 ~ exp(mean(log(x + eta), na.rm = TRUE)) - eta,
    TRUE ~ 0 #only propagate zeroes when eta is set to 0 (or less than 0)
  )
}

来源：https://stackoverflow.com/questions/2602583/geometric-mean-is-there-a-built-in

标签

statistics

built-in

geometric-mean