Compute the mean of two columns in a dataframe

问题

I have a dataframe storing different values. Sample:

a$open  a$high  a$low   a$close

1.08648 1.08707 1.08476 1.08551
1.08552 1.08623 1.08426 1.08542
1.08542 1.08572 1.08453 1.08465
1.08468 1.08566 1.08402 1.08554
1.08552 1.08565 1.08436 1.08464
1.08463 1.08543 1.08452 1.08475
1.08475 1.08504 1.08427 1.08436
1.08433 1.08438 1.08275 1.08285
1.08275 1.08353 1.08275 1.08325
1.08325 1.08431 1.08315 1.08378
1.08379 1.08383 1.08275 1.08294
1.08292 1.08338 1.08271 1.08325

What I want to do, is creating a new column a$mean storing the mean of a$high and a$low for each row.

Here is how I achieved that:

highlowmean <- function(highs, lows){
  m <- vector(mode="numeric", length=0)
  for (i in 1:length(highs)){
    m[i] <- mean(highs[i], lows[i])
  }
  return(m)
}

a$mean <- highlowmean(a$high, a$low)

However I'm a bit new into R and in functionnal languages in general, so I'm pretty sure that there is a more efficient/simple way to achieve that.

How to achieve that the smartest way?

回答1:

We can use rowMeans

 a$mean <- rowMeans(a[c('high', 'low')], na.rm=TRUE)

NOTE: If there are NA values, it is better to use rowMeans

For example

 a <- data.frame(High= c(NA, 3, 2), low= c(3, NA, 0))
 rowMeans(a, na.rm=TRUE)    
 #[1] 3 3 1

and using +

 a1 <- replace(a, is.na(a), 0)
 (a1[1] + a1[2])/2
#  High
#1  1.5
#2  1.5
#3  1.0

NOTE: This is no way trying to tarnish the other answer. It works in most cases and is fast as well.

回答2:

For the mean of two numbers you don't really need any special functions:

a$mean = (a$high + a$low) / 2

For such an easy case, this avoids any conversions to matrix to use apply or rowMeans.

来源：https://stackoverflow.com/questions/33981527/compute-the-mean-of-two-columns-in-a-dataframe

标签

dataframe

semantics