Using na.locf to carry last value forward ignoring first rows when first observation is na

允我心安 提交于 2019-12-11 07:29:15

问题


I would like to make use of the na.locf to carry forward non-missing values for data frames where first observation may be zero.

Problem

dta <- data.frame(A = c(NA, NA, 1, 2, 4, 5, NA, NA, NA),
                  B = c(NA, 5, 4, 5, 8, 9, NA, NA, 100))
dta %>% mutate_all(.funs = funs(na.locf(.)))

Error in mutate_impl(.data, dots) : Column A must be length 9 (the number of rows) or one, not 7

Desired results

Vectorize(require)(package = c("dplyr", "zoo"),
                   character.only = TRUE)

dta <- data.frame(A = c(0, NA, 1, 2, 4, 5, NA, NA, NA),
                  B = c(0, 5, 4, 5, 8, 9, NA, NA, 100))
dta %>% mutate_all(.funs = funs(na.locf(.)))

Workaround

The potential workaround would could involve replacing first set of NAs with zeros and carrying zero forward that could be later replaced but I'm interested in leaving NAs where they are and exploring if there is a convenient way to make na.locf ignore situations where the function did not receive non-NA value to start replacing.


回答1:


Use the na.rm = FALSE argument noting that it can take an entire data frame -- you don't have to separately apply it to each column.

na.locf(dta, na.rm = FALSE)

This gives:

   A   B
1 NA  NA
2 NA   5
3  1   4
4  2   5
5  4   8
6  5   9
7  5   9
8  5   9
9  5 100

Also there is na.locf0:

dta %>% mutate_all(.funs = funs(na.locf0(.)))

See the help page ?na.locf which documents the na.rm argument and also documents na.locf0 . Note that na.locf0 currently does have to be applied individually by column but always produces output of the same length.




回答2:


(Was in the process of writing this answer when @docendodiscimus's comment appeared)

From ?na.locf:

na.rm logical. Should leading NAs be removed?

So use na.rm=FALSE, optionally replacing the remaining NA values (i.e. those that were leading) with zeros thereafter:

dta <- data.frame(A = c(NA, NA, 1, 2, 4, 5, NA, NA, NA),
                  B = c(NA, 5, 4, 5, 8, 9, NA, NA, 100))
na_zero <- function(x) replace(x,is.na(x),0)
dta %>% mutate_all(.funs = funs(na.locf(.,na.rm=FALSE))) %>%
   mutate_all(.funs=funs(na_zero(.)))



回答3:


Maybe as an additional hint, if you are using the locf function of the package imputeTS you can choose between several options on what to do with the trailing NAs via the parameter na.remaining :

Selections for na.remaining:

  • keep" - return the series with NAs
  • "rm" - remove remaining NAs
  • "mean" - replace remaining NAs by overall mean
  • "rev" - perform nocb / locf from the reverse direction

The desired output could thus be reached the following way:

dta <- data.frame(A = c(NA, NA, 1, 2, 4, 5, NA, NA, NA),
              B = c(NA, 5, 4, 5, 8, 9, NA, NA, 100))

library(imputeTS)
na.locf(dta, na.remaining = "keep")

The mutate_all is not necessary here, since na.locf is automatically applied to all columns (this is also the case when using na.locf of zoo)



来源:https://stackoverflow.com/questions/47206319/using-na-locf-to-carry-last-value-forward-ignoring-first-rows-when-first-observa

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!