Replacing NAs between two rows with identical values in a specific column

烈酒焚心 提交于 2021-02-09 02:47:36

问题


I have a dataframe with multiple columns and I want to replace NAs in one column if they are between two rows with an identical number. Here is my data:

    v1 v2 
    1  2  
    NA 3
    NA 2
    1  1
    NA 7
    NA 2
    3  1

I basically want to start from the beginning of the data frame and replcae NAs in column v1 with previous Non NA if the next Non NA matches the previous one. That been said, I want the result to be like this:

    v1 v2 
    1  2  
    1 3
    1 2
    1  1
    NA 7
    NA 2
    3  1        

As you may see, rows 2 and 3 are replaced with number "1" because row 1 and 4 had an identical number but rows 5,6 stays the same because the non na values in rows 4 and 7 are not identical. I have been twicking a lot but so far no luck. Thanks


回答1:


Here is an idea using zoo package. We basically fill NAs in both directions and set NA the values that are not equal between those directions.

library(zoo)

ind1 <- na.locf(df$v1, fromLast = TRUE)
df$v1 <- na.locf(df$v1)
df$v1[df$v1 != ind1] <- NA

which gives,

 v1 v2
1  1  2
2  1  3
3  1  2
4  1  1
5 NA  7
6 NA  2
7  3  1



回答2:


Here is a base R solution, the logic is almost the same as Sotos's one:

replace_na <- function(x){
    f <- function(x) ave(x, cumsum(!is.na(x)), FUN = function(x) x[1])
    y <- f(x)
    yp <- rev(f(rev(x)))
    ifelse(!is.na(y) & y == yp, y, x)
}
df$v1 <- replace_na(df$v1)

test:

> replace_na(c(1, NA, NA, 1, NA, NA, 3))
[1]  1  1  1  1 NA NA  3



回答3:


Here is a similar approach in tidyverse using fill

library(tidyverse)
df1 %>%
  mutate(vNew = v1) %>%
  fill(vNew, .direction = 'up') %>%
  fill(v1)  %>%
  mutate(v1 = replace(v1, v1 != vNew, NA)) %>%
  select(-vNew)
#  v1 v2
#1  1  2
#2  1  3
#3  1  2
#4  1  1
#5 NA  7
#6 NA  2
#7  3  1



回答4:


I could use na.locf function to do so. Basically, I use the normal na.locf function package zoo to replace each NA with the latest previous non NA and store the data in a column. by using the same function but fixing fromlast=TRUE NAs are replaces with the first next nonNA and store them in another column. I checked these two columns and if the results in each row for these two columns are not matching I replace them with NA.



来源:https://stackoverflow.com/questions/45799028/replacing-nas-between-two-rows-with-identical-values-in-a-specific-column

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!