问题
This is a seemingly simple R question, but I don't see an exact answer here. I have a data frame (alldata) that looks like this:
Case zip market
1 44485 0
2 44481 0
3 43210 0
There are over 3.5 million records.
Then, I have a second data frame, 'zipcodes'.
market zip
1 44485
1 44486
1 44488
... ... (100 zips in market 1)
2 43210
2 43211
... ... (100 zips in market 2, etc.)
I want to return the correct value for alldata$market for each case based on alldata$zip matching the appropriate value in the zipcode data frame. I'm just looking for the right syntax, and assistance is much appreciated, as usual.
回答1:
Since you don't care about the market
column in alldata
, you can first strip it off using and merge the columns in alldata
and zipcodes
based on the zip
column using merge
:
merge(alldata[, c("Case", "zip")], zipcodes, by="zip")
The by
parameter specifies the key criteria, so if you have a compound key, you could do something like by=c("zip", "otherfield")
.
回答2:
Another option that worked for me and is very simple:
alldata$market<-with(zipcodes, market[match(alldata$zip, zip)])
回答3:
With such a large data set you may want the speed of an environment lookup. You can use the lookup
function from the qdapTools package as follows:
library(qdapTools)
alldata$market <- lookup(alldata$zip, zipcodes[, 2:1])
Or
alldata$zip %l% zipcodes[, 2:1]
回答4:
Here's the dplyr
way of doing it:
library(tidyverse)
alldata %>%
select(-market) %>%
left_join(zipcodes, by="zip")
which, on my machine, is roughly the same performance as lookup
.
来源:https://stackoverflow.com/questions/17844143/simple-lookup-to-insert-values-in-an-r-data-frame