问题
I have a data frame containing many time columns. I want to add columns for each time for year, month, date, etc.
Here is what I have so far:
library(dplyr)
library(lubridate)
times <- c(133456789, 143456789, 144456789 )
train2 <- data.frame(sent_time = times, open_time = times)
time_col_names <- c("sent_time", "open_time")
dt_part_names <- c("year", "month", "hour", "wday", "day")
train3 <- as.data.frame(train2)
dummy <- lapply(time_col_names, function(col_name) {
pct_times <- as.POSIXct(train3[,col_name], origin = "1970-01-01", tz = "GMT")
lapply(dt_part_names, function(part_name) {
part_col_name <- paste(col_name, part_name, sep = "_")
train3[, part_col_name] <- rep(NA, nrow(train3))
train3[, part_col_name] <- factor(get(part_name)(pct_times))
})
})
Everything seems to work, except the columns never get created or assigned. The components do get extracted, and the assignment succeeds without error, but train3 does not have any new columns.
I have checked that the assignment works when I call it outside the nested lapply context:
train3[, "x"] <- rep(NA, nrow(train3))
In this case, column x does get created.
回答1:
It is often believed that the apply
family provides an advantage in terms of performance compared to a for
loop. But the most important difference between a for
loop and a loop from the *apply()
family is that the latter is designed to have no side effects.
The absence of side effects favors the development of clean, well-structured, and concise code. A problem occurs if one wishes to have side effects, which is usually a symptom of a flawed code design.
Here is a simple example to illustrate this:
myvector <- 10:1
sapply(myvector,prod,2)
# [1] 20 18 16 14 12 10 8 6 4 2
It looks correct, right? The sapply()
loop has seemingly multiplied the entries of myvec
by two (granted, this result could have been achieved more easily, but this is just a simple example to discuss the functioning of *apply()
).
Upon inspection, however, one realizes that this operation has not changed myvector
at all:
> myvector
# [1] 10 9 8 7 6 5 4 3 2 1
That is because sapply()
did not have the side effect to modify myvector
. In this example the sapply()
loop is equivalent to the command print(myvector*2)
, and not to myvector <- myvector * 2
. The *apply()
loops return an object, but they don't modify the original one.
If one really wants to change the object within the loop, the superassignment operator <<-
is necessary to modify the object outside the scope of the loop. This should almost never be done, and things become quite ugly in this case. For example, the following loop does change my myvector
:
sapply(seq_along(myvector), function(x) myvector[x] <<- myvector[x]*2)
> myvector
# [1] 20 18 16 14 12 10 8 6 4 2
Coding in R should not look like this. Note that also in this more convoluted case, if the normal assignment operator <-
is used instead of <<-
then myvector
remains unchanged. The correct approach is to assign the object returned by *apply
instead of modifying it within the loop.
In the specific case described by the OP, the variable dummy
may contain the desired output if the commands in the loop are correct. But one cannot expect that the object train3
is modified within the loop. For this the <<-
operator would be necessary.
A quote mentioned in fortunes::fortune(212)
possibly summarizes the problem:
Basically R is reluctant to let you shoot yourself in the foot unless you are really determined to do so. -- Bill Venables
来源:https://stackoverflow.com/questions/39314145/how-to-split-epochs-into-year-month-etc