Add missing months for a range of date in R

左心房为你撑大大i 提交于 2021-02-11 12:01:22

问题


Say I have a data.frame as follows, each month has one entry of data:

 df <- read.table(text="date,gmsl
2009-01-17,58.4         
2009-02-17,59.1         
2009-04-16,60.9         
2009-06-16,62.3         
2009-09-16,64.6         
2009-12-16,68.3",sep=",",header=TRUE)

##  > df
##         date gmsl
## 1 2009-01-17 58.4
## 2 2009-02-17 59.1
## 3 2009-04-16 60.9
## 4 2009-06-16 62.3
## 5 2009-09-16 64.6
## 6 2009-12-16 68.3

Just wondering how could I fill missing month with gmsl as NaN for date range from 2009-01 to 2009-12?

I have extracted year and month for date column by df$Month_Yr <- format(as.Date(df$date), "%Y-%m").


回答1:


Here's a way to this with tidyr::complete

library(dplyr)

df %>%
  mutate(date = as.Date(date), 
         first_date = as.Date(format(date, "%Y-%m-01"))) %>%
  tidyr::complete(first_date = seq(min(first_date), max(first_date), "1 month"))


# A tibble: 12 x 3
#   first_date date        gmsl
#   <date>     <date>     <dbl>
# 1 2009-01-01 2009-01-17  58.4
# 2 2009-02-01 2009-02-17  59.1
# 3 2009-03-01 NA          NA  
# 4 2009-04-01 2009-04-16  60.9
# 5 2009-05-01 NA          NA  
# 6 2009-06-01 2009-06-16  62.3
# 7 2009-07-01 NA          NA  
# 8 2009-08-01 NA          NA  
# 9 2009-09-01 2009-09-16  64.6
#10 2009-10-01 NA          NA  
#11 2009-11-01 NA          NA  
#12 2009-12-01 2009-12-16  68.3

You can then decide which column to keep, either first_date or date or combine them both.

data

df <- structure(list(date = structure(1:6, .Label = c("2009-01-17", 
"2009-02-17", "2009-04-16", "2009-06-16", "2009-09-16", "2009-12-16"
), class = "factor"), gmsl = c(58.4, 59.1, 60.9, 62.3, 64.6, 
68.3)), class = "data.frame", row.names = c(NA, -6L))



回答2:


In base R you could match (using %in%) the substrings of a seq.Date.

dt.match <- seq.Date(ISOdate(2009, 1, 1), ISOdate(2009, 12, 1), "month")
sub <- 
  cbind(date=substr(dt.match, 1, 10)[!substr(dt.match, 1, 7) %in% substr(dat$date, 1, 7)], 
        gmsl=NA)
merge(dat, sub, all=TRUE)
#          date gmsl
# 1  2009-01-17 58.4
# 2  2009-02-17 59.1
# 3  2009-03-01 <NA>
# 4  2009-04-16 60.9
# 5  2009-05-01 <NA>
# 6  2009-06-16 62.3
# 7  2009-07-01 <NA>
# 8  2009-08-01 <NA>
# 9  2009-09-16 64.6
# 10 2009-10-01 <NA>
# 11 2009-11-01 <NA>
# 12 2009-12-16 68.3

Data

dat <- structure(list(date = c("2009-01-17", "2009-02-17", "2009-04-16", 
"2009-06-16", "2009-09-16", "2009-12-16"), gmsl = c(58.4, 59.1, 
60.9, 62.3, 64.6, 68.3)), row.names = c(NA, -6L), class = "data.frame")


来源:https://stackoverflow.com/questions/61133814/add-missing-months-for-a-range-of-date-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!