Geometric Mean: is there a built-in?

前端未结

关注

 9  1751

走了就别回头了 2020-11-28 20:45

I tried to find a built-in for geometric mean but couldn\'t.

(Obviously a built-in isn\'t going to save me any time while working in the shell, nor do I suspect ther

9条回答

伪装坚强ぢ (楼主)

2020-11-28 21:03

This version provides more options than the other answers.

It allows the user to distinguish between results that are not (real) numbers and those that are not available. If negative numbers are present, then the answer won't be a real number, so NaN is returned. If it's all NA values then the function will return NA_real_ instead to reflect that a real value is literally not available. This is a subtle difference, but one that might yield (slightly) more robust results.
The first optional parameter zero.rm is intended to allow the user to have zeros affect the output without making it zero. If zero.rm is set to FALSE and eta is set to NA_real_ (its default value), zeros have the effect of shrinking the result towards one. I don't have any theoretical justification for this - it just seems to make more sense to not ignore the zeros but to "do something" that doesn't involve automatically making the result zero.
eta is a way of handling zeros that was inspired by the following discussion: https://support.bioconductor.org/p/64014/

geomean <- function(x,
                    zero.rm = TRUE,
                    na.rm = TRUE,
                    nan.rm = TRUE,
                    eta = NA_real_) {
    nan.count <- sum(is.nan(x))
     na.count <- sum(is.na(x))
  value.count <- if(zero.rm) sum(x[!is.na(x)] > 0) else sum(!is.na(x))

  #Handle cases when there are negative values, all values are missing, or
  #missing values are not tolerated.
  if ((nan.count > 0 & !nan.rm) | any(x < 0, na.rm = TRUE)) {
    return(NaN)
  }
  if ((na.count > 0 & !na.rm) | value.count == 0) {
    return(NA_real_)
  }

  #Handle cases when non-missing values are either all positive or all zero.
  #In these cases the eta parameter is irrelevant and therefore ignored.
  if (all(x > 0, na.rm = TRUE)) {
    return(exp(mean(log(x), na.rm = TRUE)))
  }
  if (all(x == 0, na.rm = TRUE)) {
    return(0)
  }

  #All remaining cases are cases when there are a mix of positive and zero
  #values.
  #By default, we do not use an artificial constant or propagate zeros.
  if (is.na(eta)) {
    return(exp(sum(log(x[x > 0]), na.rm = TRUE) / value.count))
  }
  if (eta > 0) {
    return(exp(mean(log(x + eta), na.rm = TRUE)) - eta)
  }
  return(0) #only propagate zeroes when eta is set to 0 (or less than 0)
}

0 讨论(0)

查看其它9个回答