问题
Suppose I have a data frame:
mydf <- data.frame(colA = c(1,20), colB = c("a", "ab"), colC = c(T, F))
Now suppose I want to apply a function to each row on the data frame. This function uses the boolean value of column C. When using apply
, every non-string is converted to a string of the maximum length present in the column:
> apply(mydf, 1, '[', 3)
[1] " TRUE" "FALSE"
The string " TRUE"
is no longer interpretable as a logical.
> ifelse(apply(mydf, 1, '[', 3), 1, 2)
[1] NA 2
I could solve this with a gsub(" ", "", x)
, but I'd bet there is a better way. Why does apply
have this behavior when it could just directly convert the logicals to strings? Is there an apply
-like function which does not have the above behavior?
回答1:
When you called apply
, your data frame was converted to a character matrix. The spaces appear because each element is converted to the width of the widest element in the column.
You can do it with a for
loop-like sapply
call
> ( s <- sapply(seq(nrow(mydf)), function(i) mydf[i, 3]) )
# [1] TRUE FALSE
> class(s)
# [1] "logical"
A workaround to what you are doing with apply
would be
> as.logical(gsub("\\s+", "", apply(mydf, 1, `[`, 3)))
# [1] TRUE FALSE
But note that these are both exactly the same as
> mydf[,3]
# [1] TRUE FALSE
回答2:
apply
does not work directly with data.frames; it works with matrices and with matrices all elements must be the same atomic type. If you pass in a data.frame, apply()
will coerce it to a matrix. Since character values can't be stored in a more "simple" datatype, everything is converted up to a character value.
Normally you don't have think about applying functions to rows of a data.frame one a time. Most of the time what you want to accomplish can be done using the basic vectored functions across the columns of a data.frame. If you wanted
ifelse(apply(mydf, 1, '[', 3), 1, 2)
try
ifelse(mydf[, 3], 1, 2)
instead
来源:https://stackoverflow.com/questions/25854139/why-does-apply-convert-logicals-in-data-frames-to-strings-of-5-characters