Suppose I have a larger data.frame and a smaller one. If the smaller one is contained inside the larger one, how can I subtract the rows of the smaller data.frame, leaving a
setdiff is fine when no. of columns and types match, but a problem when small dataframe has a subset of columns from the big dataframe .
Alternative is anti-join in dplyr, which gives you all rows in the big dataframe that are not in the small dataframe. It keeps the columns in the big dataframe which is what you need, not combining with the small dataframe columns like other joins do. See link http://rpubs.com/williamsurles/293454
You should change ID (if its a column name) to character else R will coerce to character by default and give you a warning, but having given you a correct result. I got the same answer as setdiff() using this:
small_df$ID <- as.character(small_df$ID)
big_df$ID <- as.character(big_df$ID)
result <- anti_join(big_df,small_df)
Result =
ID CSF1P0 CSF1P0.1 D10S1248 D10S1248.1 D12S391 D12S391.1
203078_MG_M -9 -9 15 15 18 20
203078_MG_F -9 -9 14 15 17 19
203080_BA_F 10 11 14 16 -9 -9
203081_MG_M 10 12 14 16 -9 -9
203081_MG_F 11 12 15 16 -9 -9
203082_MG_M 11 11 13 15 -9 -9
203082_MG_F 11 11 13 14 -9 -9