问题
Probably simple but tricky question especially for larger data sets. Given two dataframes (df1
,df2
) of equal dimensions as below:
head(df1)
a b c
1 0.8569720 0.45839112 NA
2 0.7789126 0.36591578 NA
3 0.6901663 0.88095485 NA
4 0.7705756 0.54775807 NA
5 0.1743111 0.89087819 NA
6 0.5812786 0.04361905 NA
and
head(df2)
a b c
1 0.21210312 0.7670091 NA
2 0.19767464 0.3050934 1
3 0.08982958 0.4453491 2
4 0.75196925 0.6745908 3
5 0.73216793 0.6418483 4
6 0.73640209 0.7448011 5
How can one find all columns where if(all(is.na(df1))
, in this case c
, go to df2
and set all values in matching column (c
) to NAs
.
Desired output
head(df3)
a b c
1 0.21210312 0.7670091 NA
2 0.19767464 0.3050934 NA
3 0.08982958 0.4453491 NA
4 0.75196925 0.6745908 NA
5 0.73216793 0.6418483 NA
6 0.73640209 0.7448011 NA
My actual dataframes have more than 140000
columns.
回答1:
We can use colSums
on the negated logical matrix (is.na(df1)
), negate (!) the
vector` so that 0 non-NA elements becomes TRUE and all others FALSE, use this to subset the columns of 'df2' and assign it to NA.
df2[!colSums(!is.na(df1))] <- NA
df2
# a b c
#1 0.21210312 0.7670091 NA
#2 0.19767464 0.3050934 NA
#3 0.08982958 0.4453491 NA
#4 0.75196925 0.6745908 NA
#5 0.73216793 0.6418483 NA
#6 0.73640209 0.7448011 NA
Or another option is to loop over the columns and check whether all
the elements are NA to create a logical vector for subsetting the columns of 'df2' and assigning it to NA
df2[sapply(df1, function(x) all(is.na(x)))] <- NA
If these are big datasets, another option would be set
from data.table
(should be more efficient as this does the assignment in place)
library(data.table)
setDT(df2)
j1 <- which(sapply(df1, function(x) all(is.na(x))))
for(j in j1){
set(df2, i = NULL, j = j, value = NA)
}
来源:https://stackoverflow.com/questions/39622020/conditional-replacement-of-nas-in-two-dataframes-r