How to avoid implicit character conversion when using apply on dataframe

回眸只為那壹抹淺笑 提交于 2019-12-30 02:59:26

问题


When using apply on a data.frame, the arguments are (implicitly) converted to character. An example:

df <- data.frame(v=1:10, t=1:10)
df <- transform(df, t2 = as.POSIXlt(t, origin = "2013-08-13"))
class(df$t2[1])
## [1] "POSIXct" "POSIXt" (correct)

but:

 apply(df, 1, function(y) class(y["t2"]))
 ## [1] "character" "character" "character" "character" "character" "character"
 ## [7] "character" "character" "character" "character"

Is there any way to avoid this conversion? Or do I always have to convert back through as.POSIXlt(y["t2"])?

edit
My df has 2 timestamps (say, t2 and t3) and some other fields (say, v1, v2). For each row with given t2, I want to find k (e.g. 3) rows with t3 closest to, but lower than t2 (and the same v1), and return a statistics over v2 from these rows (e.g. an average). I wrote a function f(t2, v1, df) and just wanted to apply it on all rows using apply(df, 1, function(x) f(y["t2"], y["v1"], df). Is there any better way to do such things in R?


回答1:


Let's wrap up multiple comments into an explanation.

  1. the use of apply converts a data.frame to a matrix. This means that the least restrictive class will be used. The least restrictive in this case is character.
  2. You're supplying 1 to apply's MARGIN argument. This applies by row and makes you even worse off as you're really mixing classes together now. In this scenario you're using apply designed for matrices and data.frames on a vector. This is not the right tool for the job.
  3. In ths case I'd use lapply or sapply as rmk points out to grab the classes of the single t2 column as seen below:

Code:

df <- data.frame(v=1:10, t=1:10)
df <- transform(df, t2 = as.POSIXlt(t, origin = "2013-08-13"))

sapply(df[, "t2"], class)
lapply(df[, "t2"], class)

## [[1]]
## [1] "POSIXct" "POSIXt" 
## 
## [[2]]
## [1] "POSIXct" "POSIXt" 
## 
## [[3]]
## [1] "POSIXct" "POSIXt" 
## 
## .
## .
## . 
## 
## [[9]]
## [1] "POSIXct" "POSIXt" 
## 
## [[10]]
## [1] "POSIXct" "POSIXt" 

In general you choose the apply family that fits the job. Often I personally use lapply or a for loop to act on specific columns or subset the columns I want using indexing ([, ]) and then proceed with apply. The answer to this problem really boils down to determining what you want to accomplish, asking is apply the most appropriate tool, and proceed from there.

May I offer this blog post as an excellent tutorial on what the different apply family of functions do.




回答2:


Try:

sapply(df, function(y) class(y["t2"]))

$v
[1] "integer"

$t
[1] "integer"

$t2
[1] "POSIXct" "POSIXt"


来源:https://stackoverflow.com/questions/18214431/how-to-avoid-implicit-character-conversion-when-using-apply-on-dataframe

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!