I have to remove columns in my dataframe which has over 4000 columns and 180 rows.The conditions I want to set in to remove the column in the dataframe are: (i) Remove the c
I feel like this is all over-complicated. Condition 2 already includes all the rest of the conditions, as if there are at least two non-NA values in a column, obviously the whole column aren't NAs. And if there are at least two consecutive values in a column, then obviously this column contains more than one value. So instead of 3 conditions, this all sums up into a single condition (I prefer not to run many functions per column, rather after running diff per column- vecotrize the whole thing):
cond <- colSums(is.na(sapply(df, diff))) < nrow(df) - 1
This works because if there are no consecutive values in a column, the whole column will become NAs.
Then, just
df[, cond, drop = FALSE]
# A E
# 1 0.018 NA
# 2 0.017 NA
# 3 0.019 NA
# 4 0.018 NA
# 5 0.018 NA
# 6 0.015 0.037
# 7 0.016 0.031
# 8 0.019 0.025
# 9 0.016 0.035
# 10 0.018 0.035
# 11 0.017 0.043
# 12 0.023 0.040
# 13 0.022 0.042
Per your edit, it seems like you have a data.table object and you also have a Date column so the code would need some modifications.
cond <- df[, lapply(.SD, function(x) sum(is.na(diff(x)))) < .N - 1, .SDcols = -1]
df[, c(TRUE, cond), with = FALSE]
Some explanations:
.SDcols = -1 when operating on our .SD (which means Sub Data in data.tableis) .N is just the rows count (similar to nrow(df)c(TRUE,... data.table works with non standard evaluation by default, hence, if you want to select column as if you would in a data.frame you will need to specify with = FALSEA better way though, would be just to remove the column by reference using := NULL
cond <- c(FALSE, df[, lapply(.SD, function(x) sum(is.na(diff(x)))) == .N - 1, .SDcols = -1])
df[, which(cond) := NULL]