Subset based on first three numbers

强颜欢笑 提交于 2019-12-13 14:20:57

问题


I have a very large data set of variables and I need to subset based on the first three numbers of the zip code. I'm not sure how to do this and would appreciate any help you can provide.

How would I subset this example dput to remove all those zip codes that start with 721. Note that I can't simple do a greater than (>) since there are zip codes large than 721 Thanks!

dput :

data <- structure(list(state = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("AR", 
  "IL", "MO"), class = "factor"), zip = c(72003L, 72042L, 72073L, 
  72166L, 72038L, 72055L, 72160L, 72026L, 72048L, 72140L, 72003L, 
  72042L, 72073L, 72166L, 72038L, 72055L, 72160L, 72026L, 72048L, 
  72140L)), .Names = c("state", "zip"), row.names = c(NA, 20L), class = "data.frame")

Data :

   state   zip
1     AR 72003
2     AR 72042
3     AR 72073
4     AR 72166
5     AR 72038
6     AR 72055
7     AR 72160
8     AR 72026
9     AR 72048
10    AR 72140
11    AR 72003
12    AR 72042
13    AR 72073
14    AR 72166
15    AR 72038
16    AR 72055
17    AR 72160
18    AR 72026
19    AR 72048
20    AR 72140

回答1:


You can try substr

data[substr(data$zip, 1,3)!=721,]

Or using data.table

library(data.table)
setDT(data)[substr(zip,1,3)!=721]

Or dplyr

library(dplyr)
data %>% 
      filter(substr(zip, 1,3)!=721)

Or using extract from tidyr

library(tidyr)
extract(data, zip, 'zip1', '(...).*', FALSE) %>% 
                              filter(zip1!=721) %>% 
                              select(-zip1)


来源:https://stackoverflow.com/questions/28060621/subset-based-on-first-three-numbers

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!