问题
I would like to make use of the na.locf to carry forward non-missing values for data frames where first observation may be zero.
Problem
dta <- data.frame(A = c(NA, NA, 1, 2, 4, 5, NA, NA, NA),
B = c(NA, 5, 4, 5, 8, 9, NA, NA, 100))
dta %>% mutate_all(.funs = funs(na.locf(.)))
Error in
mutate_impl(.data, dots)
: ColumnA
must be length 9 (the number of rows) or one, not 7
Desired results
Vectorize(require)(package = c("dplyr", "zoo"),
character.only = TRUE)
dta <- data.frame(A = c(0, NA, 1, 2, 4, 5, NA, NA, NA),
B = c(0, 5, 4, 5, 8, 9, NA, NA, 100))
dta %>% mutate_all(.funs = funs(na.locf(.)))
Workaround
The potential workaround would could involve replacing first set of NAs
with zeros and carrying zero forward that could be later replaced but I'm interested in leaving NAs where they are and exploring if there is a convenient way to make na.locf
ignore situations where the function did not receive non-NA value to start replacing.
回答1:
Use the na.rm = FALSE
argument noting that it can take an entire data frame -- you don't have to separately apply it to each column.
na.locf(dta, na.rm = FALSE)
This gives:
A B
1 NA NA
2 NA 5
3 1 4
4 2 5
5 4 8
6 5 9
7 5 9
8 5 9
9 5 100
Also there is na.locf0
:
dta %>% mutate_all(.funs = funs(na.locf0(.)))
See the help page ?na.locf
which documents the na.rm
argument and also documents na.locf0
. Note that na.locf0 currently does have to be applied individually by column but always produces output of the same length.
回答2:
(Was in the process of writing this answer when @docendodiscimus's comment appeared)
From ?na.locf
:
na.rm logical. Should leading NAs be removed?
So use na.rm=FALSE
, optionally replacing the remaining NA
values (i.e. those that were leading) with zeros thereafter:
dta <- data.frame(A = c(NA, NA, 1, 2, 4, 5, NA, NA, NA),
B = c(NA, 5, 4, 5, 8, 9, NA, NA, 100))
na_zero <- function(x) replace(x,is.na(x),0)
dta %>% mutate_all(.funs = funs(na.locf(.,na.rm=FALSE))) %>%
mutate_all(.funs=funs(na_zero(.)))
回答3:
Maybe as an additional hint, if you are using the locf function of the package imputeTS you can choose between several options on what to do with the trailing NAs via the parameter na.remaining :
Selections for na.remaining:
- keep" - return the series with NAs
- "rm" - remove remaining NAs
- "mean" - replace remaining NAs by overall mean
- "rev" - perform nocb / locf from the reverse direction
The desired output could thus be reached the following way:
dta <- data.frame(A = c(NA, NA, 1, 2, 4, 5, NA, NA, NA),
B = c(NA, 5, 4, 5, 8, 9, NA, NA, 100))
library(imputeTS)
na.locf(dta, na.remaining = "keep")
The mutate_all is not necessary here, since na.locf is automatically applied to all columns (this is also the case when using na.locf of zoo)
来源:https://stackoverflow.com/questions/47206319/using-na-locf-to-carry-last-value-forward-ignoring-first-rows-when-first-observa