问题
I need to create a "flag" column within my main data frame that flags rows where the date is within a specific time range. That time range comes from a second data frame. I think I'm just stuck on the ifelse (or if) statement because there are NA's in the flag column. Maybe ifelse isn't the way to go. Here's some sample data:
# main data frame
date <- seq(as.Date("2014-07-21"), as.Date("2014-09-11"), by = "day")
group <- letters[1:4]
datereps <- rep(date, length(group))
groupreps <- rep(group, each = length(date))
value <- rnorm(length(datereps))
df <- data.frame(Date = datereps, Group = groupreps, Value = value)
# flag time period data frame
flag <- data.frame(Group = c("b", "d"),
start = c("2014-08-01", "2014-08-26"),
end = c("2014-08-11", "2014-09-01"))
# Merge flag dates into main data frame
df2 <- merge(df, flag, by = "Group", all.x = T)
# Execute ifelse statement on each row
df2$flag <- "something"
df2$flag <- ifelse(df2$Date >= as.Date(df2$start) & df2$Date <= as.Date(df2$end), "flag", "other")
The result is that in rows where a "start" and "end" date are specified, "flag" and "other" are labeled, but where "start" and "end" are NA, I get Na values for df2$flag
. This happens even when I initiate df2$flag
with "something"
. I want "other"
for all values that are not defined as "flag"
. Look at rows 50:68.
df2[50:68,]
回答1:
If I was doing this I'd skip the intermediate dataframe (df2
) and the merge step and use ifelse with |
which means OR.
date <- seq(as.Date("2014-07-21"), as.Date("2014-09-11"), by = "day")
group <- letters[1:4]
datereps <- rep(date, length(group))
groupreps <- rep(group, each = length(date))
value <- rnorm(length(datereps))
df <- data.frame(DateTime = datereps, Group = groupreps, Value = value)
This applies flag
to the criteria you specified:
df$flag <- ifelse(df$DateTime >= as.Date("2014-08-01") & df$DateTime <= "2014-08-11" |
df$DateTime >= as.Date("2014-08-26") & df$DateTime <= "2014-09-01",
"flag", "other")
Then you can have a look:
df[df$flag=="flag",]
回答2:
Change your last line to:
for (i in 1:nrow(df2)) {
if (is.na(df2$start[i])) {
df2$flag[i] = 'other'
} else if (df2$Date[i] >= as.Date(df2$start[i]) & df2$Date[i] <= as.Date(df2$end[i])) {
df2$flag[i] = "flag"
} else {
df2$flag[i] = "other"
}
}
Its ugly but it does the job. This code is not vectorized, so its fine for your situation, but would be slow for larger applications.
来源:https://stackoverflow.com/questions/42516632/create-column-to-flag-rows-within-a-date-period-in-r