问题
I have an R question that I'm even sure how to word in one sentence, and couldn't find an answer for this yet.
I have two data frames that I would like to 'intersect' and find all rows where column values match in two columns. I've tried connecting two intersect() and which() statements with &&, but neither has given me what I want yet.
Here's what I mean. Let's say I have two data frames:
> testData
Email Manual Campaign Bounced Opened Clicked ClickThru Unsubscribed
1 stack@overflow.com EIFLS0LS 1 0 0 0 0 0
2 stack@exchange.com EIFLS0LS 1 0 0 0 0 0
3 data@frame.com EIFLS0LS 1 0 0 0 0 0
4 block@quote.com EIFLS0LS 1 0 0 0 0 0
5 ht@ml.com EIFLS0LS 1 0 0 0 0 0
6 tele@phone.com EIFLS0LS 1 0 0 0 0 0
> testBounced
Email Campaign
1 stack@overflow.com 1
2 stack@overflow.com 2
3 data@frame.com 2
4 block@quote.com 1
5 ht@ml.com 1
6 lap@top.com 1
As you can see, there are some values in the column Email that intersect, and some from the column Campaign that intersect. I want all of the rows from testData in which BOTH columns match.
ie:
Email Manual Campaign Bounced Opened Clicked ClickThru Unsubscribed
1 stack@overflow.com EIFLS0LS 1 0 0 0 0 0
2 block@quote.com EIFLS0LS 1 0 0 0 0 0
3 ht@ml.com EIFLS0LS 1 0 0 0 0 0
EDIT:
My goal in finding these columns is to be able to update a row in the original column. So the final output that I would like is:
> testData
Email Manual Campaign Bounced Opened Clicked ClickThru Unsubscribed
1 stack@overflow.com EIFLS0LS 1 1 0 0 0 0
2 stack@exchange.com EIFLS0LS 1 0 0 0 0 0
3 data@frame.com EIFLS0LS 1 0 0 0 0 0
4 block@quote.com EIFLS0LS 1 1 0 0 0 0
5 ht@ml.com EIFLS0LS 1 1 0 0 0 0
6 tele@phone.com EIFLS0LS 1 0 0 0 0 0
My apologies if this is a duplicate, and thanks in advance for your help!
EDIT2::
I ended up just using a for loop, nothing great, but doesn't feel efficient. The dataset was small enough to do it quickly, though. If anyone has a quick, R-style way to do it, I'd be happy to see it!
回答1:
If you use data.tables
and key by the columns you want to match, then you can accomplish your goal in one line:
tData[tBounce, Bounced := 1L]
Here is the full process:
library(data.table)
keys <- c("Email", "Campaign")
tData <- data.table(testData, key=keys)
tBounce <- data.table(testBounce, key=keys)
tData[tBounce, Bounced := 1L]
Results:
tData
Email Manual Campaign Bounced Opened Clicked ClickThru Unsubscribed
1: block@quote.com EIFLS0LS 1 1 0 0 0 0
2: data@frame.com EIFLS0LS 1 0 0 0 0 0
3: ht@ml.com EIFLS0LS 1 1 0 0 0 0
4: stack@exchange.com EIFLS0LS 1 0 0 0 0 0
5: stack@overflow.com EIFLS0LS 1 1 0 0 0 0
6: tele@phone.com EIFLS0LS 1 0 0 0 0 0
>
回答2:
You want the function merge
.
merge
is commonly used to merge two tables by one similar common, but the by
argument can allow multiple columns:
merge(testData, testBounced, by=c("Email", "Campaign"))
All pairs of Email
and Campaign
that don't match will be discarded by default. That's controllable by the arguments all.x
and all.y
, which default to FALSE
.
The default argument for by
is intersect(names(x, y))
, so you technically don't need to specify the columns in this case, but it's good for clarity.
来源:https://stackoverflow.com/questions/17888764/r-finding-rows-of-a-data-frame-where-certain-columns-match-those-of-another