问题
Is there a way to remove rows from a dataframe, based on the column of another dataframe?
For example, Dataframe 1:
Gene CHROM POS REF ALT N_INFORMATIVE Test Beta SE
AAA 1 15211 T G 1481 1:15211 -0.0599805 0.112445
LLL 1 762061 T A 1481 1:762061 0.2144100 0.427085
CCC 1 762109 C T 1481 1:762109 0.2847510 0.204255
DDD 1 762273 G A 1481 1:762273 0.0443946 0.119924
Dataframe 2 (only 1 column):
Genes
AAA
BBB
CCC
DDD
EEE
FFF
In this situtation, I want to scan Dataframe 1, column 1 for any matches to Dataframe 2, and remove matching rows.
They need to be an exact match, and the result would look like this:
Gene CHROM POS REF ALT N_INFORMATIVE Test Beta SE
LLL 1 762061 T A 1481 1:762061 0.2144100 0.427085
I've tried variations of this, but it hasn't worked:
NewDataframe <-!(Dataframe1$Gene==Dataframe2$Genes)
Thanks for reading.
回答1:
Use %in%
to identify which elements from the first data frame are not contained in the second data frame, then pass the resulting logical vector to the first data frame to subset.
dat1 <- data.frame(id = LETTERS[1:10], stringsAsFactors = FALSE)
dat2 <- data.frame(id = c("B", "D"), stringsAsFactors = FALSE)
dat1[!dat1$id %in% dat2$id, , drop = FALSE]
# id
# 1 A
# 3 C
# 5 E
# 6 F
# 7 G
# 8 H
# 9 I
# 10 J
来源:https://stackoverflow.com/questions/38574511/removing-rows-based-on-column-in-another-dataframe