R Apply() function on specific dataframe columns

霸气de小男生 提交于 2019-11-28 03:14:58

Using an example data.frame and example function (just +1 to all values)

A <- function(x) x + 1
wifi <- data.frame(replicate(9,1:4))
wifi

#  X1 X2 X3 X4 X5 X6 X7 X8 X9
#1  1  1  1  1  1  1  1  1  1
#2  2  2  2  2  2  2  2  2  2
#3  3  3  3  3  3  3  3  3  3
#4  4  4  4  4  4  4  4  4  4

data.frame(wifi[1:3], apply(wifi[4:9],2, A) )
#or
cbind(wifi[1:3], apply(wifi[4:9],2, A) )

#  X1 X2 X3 X4 X5 X6 X7 X8 X9
#1  1  1  1  2  2  2  2  2  2
#2  2  2  2  3  3  3  3  3  3
#3  3  3  3  4  4  4  4  4  4
#4  4  4  4  5  5  5  5  5  5

Or even:

data.frame(wifi[1:3], lapply(wifi[4:9], A) )
#or
cbind(wifi[1:3], lapply(wifi[4:9], A) )

#  X1 X2 X3 X4 X5 X6 X7 X8 X9
#1  1  1  1  2  2  2  2  2  2
#2  2  2  2  3  3  3  3  3  3
#3  3  3  3  4  4  4  4  4  4
#4  4  4  4  5  5  5  5  5  5

lapply is probably a better choice than apply here, as apply first coerces your data.frame to an array which means all the columns must have the same type. Depending on your context, this could have unintended consequences.

The pattern is:

df[cols] <- lapply(df[cols], FUN)

The 'cols' vector can be variable names or indices. I prefer to use names whenever possible (it's robust to column reordering). So in your case this might be:

wifi[4:9] <- lapply(wifi[4:9], A)

An example of using column names:

wifi <- data.frame(A=1:4, B=runif(4), C=5:9)
wifi[c("B", "C")] <- lapply(wifi[c("B", "C")], function(x) -1 * x)

As mentioned, you simply want the standard R apply function applied to columns (MARGIN=2):

wifi[,4:9] <- apply(wifi[,4:9], MARGIN=2, FUN=A)

Or, for short:

wifi[,4:9] <- apply(wifi[,4:9], 2, A)

This updates columns 4:9 in-place using the A() function. Now, let's assume that na.rm is an argument to A(), which it probably should be. We can pass na.rm=T to remove NA values from the computation like so:

wifi[,4:9] <- apply(wifi[,4:9], MARGIN=2, FUN=A, na.rm=T)

The same is true for any other arguments you want to pass to your custom function.

I think what you want is mapply. You could apply the function to all columns, and then just drop the columns you don't want. However, if you are applying different functions to different columns, it seems likely what you want is mutate, from the dplyr package.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!