Simple lookup to insert values in an R data frame

老子叫甜甜 提交于 2019-12-21 03:46:14

问题


This is a seemingly simple R question, but I don't see an exact answer here. I have a data frame (alldata) that looks like this:

Case     zip     market
1        44485   0
2        44481   0
3        43210   0

There are over 3.5 million records.

Then, I have a second data frame, 'zipcodes'.

market    zip
1         44485
1         44486
1         44488
...       ... (100 zips in market 1)
2         43210
2         43211
...       ... (100 zips in market 2, etc.)

I want to return the correct value for alldata$market for each case based on alldata$zip matching the appropriate value in the zipcode data frame. I'm just looking for the right syntax, and assistance is much appreciated, as usual.


回答1:


Since you don't care about the market column in alldata, you can first strip it off using and merge the columns in alldata and zipcodes based on the zip column using merge:

merge(alldata[, c("Case", "zip")], zipcodes, by="zip")

The by parameter specifies the key criteria, so if you have a compound key, you could do something like by=c("zip", "otherfield").




回答2:


Another option that worked for me and is very simple:

alldata$market<-with(zipcodes, market[match(alldata$zip, zip)])



回答3:


With such a large data set you may want the speed of an environment lookup. You can use the lookup function from the qdapTools package as follows:

library(qdapTools)
alldata$market <- lookup(alldata$zip, zipcodes[, 2:1])

Or

alldata$zip %l% zipcodes[, 2:1]



回答4:


Here's the dplyr way of doing it:

library(tidyverse)
alldata %>%
  select(-market) %>%
  left_join(zipcodes, by="zip")

which, on my machine, is roughly the same performance as lookup.



来源:https://stackoverflow.com/questions/17844143/simple-lookup-to-insert-values-in-an-r-data-frame

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!