Carry Last Observation Forward by ID in R

≡放荡痞女 提交于 2019-12-23 12:20:04

问题


I have daily observations with lots of missing values and am trying to propagate the first non-missing value through a vector for each individual.

In the searching that I have done so far, I discovered the na.locf function in the zoo package; however, I now need to condition this function based on the id variable in my data frame. Is ddply the right function for this? If so, can someone help me please figure out how to get the output to be included in a new variable called result in the same data frame?

This is what I have so far:

# Load required libraries
library(zoo)
library(plyr)

# Create the data
data <- structure(list(id = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 
              2, 2, 2), day = c(0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 7, 
              8), value = c("NA", "1", "NA", "NA", "NA", "NA", "NA", "NA", 
              "NA", "NA", "1", "NA", "NA", "NA", "NA", "NA")), .Names = c("id", 
              "day", "value"), row.names = c(NA, -16L), class = "data.frame")

# Propagate the value of the first non-missing observation in data$value forward for each id
data$result <- na.locf(data$value, na.rm = FALSE)

Any thoughts on how to run the the na.locf function by each id would be greatly appreciated. Thanks!


回答1:


1) Firstly note that the value column is a character column with "NA" values, not NA values so lets fix that first in ##. Then create a wrapper function na.locf.na which uses na.locf in the zoo package and is the same except it defaults to na.rm = FALSE. Finally use ave to apply na.locf by id:

library(zoo)

data2 <- transform(data, value = as.numeric(value)) ##

na.locf.na <- function(x, na.rm = FALSE, ...) na.locf(x, na.rm = na.rm, ...)
transform(data2, value = ave(value, id, FUN = na.locf.na))

2) or this compact alternative using fn from the gsubfn package to represent na.locf.na inline in a more compact manner:

library(zoo)
library(gsubfn)

transform(data2, value = fn$ave(value, id, FUN = ~ na.locf(x, na.rm = FALSE)))

In either of these two cases the result is:

   id day value
1   1   0    NA
2   1   1     1
3   1   2     1
4   1   3     1
5   1   4     1
6   1   5     1
7   1   6     1
8   2   0    NA
9   2   1    NA
10  2   2    NA
11  2   3     1
12  2   4     1
13  2   5     1
14  2   6     1
15  2   7     1
16  2   8     1

3) We could alternately use dplyr together with zoo using na.locf.na from above:

library(zoo)
library(dplyr)

data2 <- data %>% mutate(value = as.numeric(value)) # fix value column
data2 %>% group_by(id) %>% mutate(value = na.locf.na(value))

If the dplyr from CRAN does not work here try the one from github:

library(devtools)
install_github("hadley/dplyr")

REVISIONS Reorganized presentation and added alternatives.



来源:https://stackoverflow.com/questions/23818493/carry-last-observation-forward-by-id-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!