问题
I have a large data set which consists of a columns of IDs followed by a monthly time series for each ID. There are frequent missing values in this set, but what I would like to do is replace all NAs after the first non-zero with a zero while leaving all the NAs before the first non-zero value as NA's.
eg.
[NA NA NA 1 2 3 NA 4 5 NA] would be changed to [NA NA NA 1 2 3 0 4 5 0]
Any help or advice you guys could offer would be much appreciated!
回答1:
Easy to do using match()
and numeric indices:
- use
match()
to find the first occurence of a non-NA value - use
which()
to convert the logical vector fromis.na()
to a numeric index - use that information to find the correct positions in x
Hence:
x <- c(NA,NA,NA,1,2,3,NA,NA,4,5,NA)
isna <- is.na(x)
nonna <- match(FALSE,isna)
id <- which(isna)
x[id[id>nonna]] <- 0
gives:
> x
[1] NA NA NA 1 2 3 0 0 4 5 0
回答2:
Here's another method. Convert all to zeros first, then covert the first zeros back to NA
.
> x <- c(NA,NA,NA,1,2,3,NA,NA,4,5,NA)
> x[which(is.na(x))] <- 0
### index from 1 to first element before the first element >0
> x[1:min(which(x>0))-1] <- NA
> x
[1] NA NA NA 1 2 3 0 0 4 5 0
also
### end of vector (elements are >0)
> endOfVec <- min(which(x>0)):length(x)
> x[endOfVec][is.na(x[endOfVec])] <- 0
[1] NA NA NA 1 2 3 0 0 4 5 0
来源:https://stackoverflow.com/questions/20684499/r-convert-nas-only-after-the-first-non-zero-value