Selecting columns in R data frame based on those *not* in a vector

坚强是说给别人听的谎言 提交于 2019-11-27 00:41:50
harkmug

An alternative to grep is which:

df.2 <- df[, -which(names(df) %in% c("name1", "name2", "name3"))]

You can make a shorter call that is also more generalizable with negative-grep:

df.2 <- df[, -grep("^name[1:3]$", names(df) )] 

Since grep returns numerics you can use the negative vector indexing to remove columns. You could add further number or more complex patterns.

dplyr::select() has several options for dropping specific columns:

library(dplyr)

drop_columns <- c('cyl','disp','hp')
mtcars %>% 
  select(-one_of(drop_columns)) %>% 
  head(2)

              mpg drat    wt  qsec vs am gear carb
Mazda RX4      21  3.9 2.620 16.46  0  1    4    4
Mazda RX4 Wag  21  3.9 2.875 17.02  0  1    4    4

Negating specific column names, the following drops the column "hp" and the columns from "qsec" through "gear":

mtcars %>% 
  select(-hp, -(qsec:gear)) %>% 
  head(2)

              mpg cyl disp drat    wt carb
Mazda RX4      21   6  160  3.9 2.620    4
Mazda RX4 Wag  21   6  160  3.9 2.875    4

You could also negate contains(), starts_with(), ends_with(), or matches():

mtcars %>% 
  select(-contains('t')) %>%
  select(-starts_with('a')) %>% 
  select(-ends_with('b')) %>% 
  select(-matches('^m.+g$')) %>% 
  head(2)

              cyl disp  hp  qsec vs gear
Mazda RX4       6  160 110 16.46  0    4
Mazda RX4 Wag   6  160 110 17.02  0    4

You could make a custom function to do this if you're using it for your own use to manipulate data. I may do something like this:

rm.col <- function(df, ...) {
    x <- substitute(...())
    z <- Trim(unlist(lapply(x, function(y) as.character(y))))
    df[, !names(df) %in% z]
}

rm.col(mtcars, hp, mpg)

The first argument is the dataframe name. the following ... are the names of any columns you wish to remove.

Old thread, but here's another solution:

df.2 <- subset(df, select=-c(name1, name2, name3))

This was posted in another similar thread (though I can't find it right now). Should be sustainable code in the situation you describe, and is probably easier to read and edit than some of the other options.

The easiest way that comes to my mind:

filtered_df<-df[, setdiff(names(df),c("name1","name2") ]

essentially you are computing the set difference between full list of column names and the subset you want to filter out (name1 and name2 above).

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!