Apply FUN row-wise on data frame with integer and character variables

白昼怎懂夜的黑 提交于 2021-01-28 06:07:51

问题


A completely basic question - and forgive me if it is a duplicate.

set.seed(1)
df <- 
  data.frame(id=c('a', 'a', 'b', 'b', 'a'),
             a=sample(1:10, size=5, replace=T),
             b=sample(1:10, size=5, replace=T),
             c=sample(1:10, size=5, replace=T)) 

Then,

> df
  id  a  b c
1  a  3  9 3
2  a  4 10 2
3  b  6  7 7
4  b 10  7 4
5  a  3  1 8

To return the column name (a, b or c) with the largest value, and if this is in the id variable take the second highest, I use the below function.

FUN <- function(r) {
  top <- names(r[,c('a', 'b', 'c')])[order(r[,c('a', 'b', 'c')], decreasing=T)]
  ifelse(top[1] == r[['id']], top[2], top[1])
}

I can do:

FUN(df[1,]) #[1] "b"

and for all rows:

res <- NULL
for(i in 1:nrow(df)) {
res <- c(res, FUN(df[i,]))  
}

And get

> res
[1] "b" "b" "c" "a" "c"

But how can I apply this ? E.g. this is not working:

apply(df, 1, FUN)

I suspect the trouble is that FUN assumes a 1-row data frame (and not a named vector of characters like (first row))

 id   a   b   c 
"a" "3" "9" "c"

From apply?:

If X is not an array but an object of a class with a non-null dim value (such as a data frame), apply attempts to coerce it to an array via as.matrix if it is two-dimensional (e.g., a data frame) or via as.array.


回答1:


If you must use your function, you can do,

sapply(split(df, 1:nrow(df)), f1)
#  1   2   3   4   5 
#"b" "b" "c" "a" "c" 

NOTE I renamed your FUN to f1 since FUN is used by various functions in R so as to define the argument of function




回答2:


Another option is to make some minor modifications to your FUN. I think the issue you were running into was that apply will treat each row as a vector. Since your id column is a character, this means that your a/b/c columns will also be coerced to character. Realizing this we can modify the FUN slightly to convert it back to numeric for ordering:

FUN <- function(r) {
  top <- c('a', 'b', 'c')[order(as.numeric(r[c('a', 'b', 'c')]), decreasing=T)]
  ifelse(top[1] == as.character(r['id']), top[2], top[1])
}

apply(df, 1, FUN)
#[1] "b" "b" "c" "a" "c"

To see how this works in a little more detail you can run the below and see that apply is reading through named character vectors.

apply(df, 1, function(x) {print(x); print(class(x)); return(NULL)})
#  id    a    b    c 
# "a" " 3" " 9"  "3" 
#[1] "character"
#  id    a    b    c 
# "a" " 4" "10"  "2" 
#[1] "character"
#  id    a    b    c 
# "b" " 6" " 7"  "7" 
#[1] "character"
#  id    a    b    c 
# "b" "10" " 7"  "4" 
#[1] "character"
#  id    a    b    c 
# "a" " 3" " 1"  "8" 
#[1] "character"
#NULL


来源:https://stackoverflow.com/questions/44591238/apply-fun-row-wise-on-data-frame-with-integer-and-character-variables

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!