问题
I am working on a ETL testing project, where my need is to compare data between two tables from two different databases. to do this, I first downloaded entire tables using query like below.
query_table_a <- paste0("SELECT * FROM MBR_MEAS (NOLOCK)")
table_a <- as.data.frame(sqlQuery(cn, query_table_a))
Then, I used anti_join() from the dplyr. If the column name is same in both data frames, then my result is good. for example(this returns good and expected results)
mismatch_records <- anti_join(table_a, table_b, by="client_id")
But in another scenario, column name is changed (table 'c' has column name as client_id and table 'd' has clientid, I couldn't figure out what to do. I tried using merge function but that doesn't seems to be very promising.
merge(x = table_c, y = table_d, by.x ="CLIENT_ID", by.y = "ClientId", all.x = "TRUE")
any suggestions please?
回答1:
Try this:
mismatch_records <- anti_join(table_c, table_d, by = c("CLIENT_ID" = "ClientId"))
来源:https://stackoverflow.com/questions/40312639/using-anti-join-from-the-dplyr-on-two-tables-from-two-different-databases