Street Address search in a string - Python or Ruby

自作多情 提交于 2020-01-02 14:31:44

问题


Hey, I was wondering how I can find a Street Address in a string in Python/Ruby?

Perhaps by a regex?

Also, it's gonna be in the following format (US)

420 Fanboy Lane, Cupertino CA

Thanks!


回答1:


Using your example this is what I came up with in Ruby (I edited it to include ZIP code and an optional +4 ZIP):

regex = Regexp.new(/^[0-9]* (.*), (.*) [a-zA-Z]{2} [0-9]{5}(-[0-9]{4})?$/)
addresses = ["420 Fanboy Lane, Cupertino CA 12345"]
addresses << "1829 William Tell Oveture, by Gioachino Rossini 88421"
addresses << "114801 Western East Avenue Apt. B32, Funky Township CA 12345"
addresses << "1 Infinite Loop, Cupertino CA 12345-1234"
addresses << "420 time!"

addresses.each do |address|
  print address
  if address.match(regex)
    puts " is an address"
  else
    puts " is not an address"
  end
end

# Outputs:
> 420 Fanboy Lane, Cupertino CA 12345 is an address  
> 1829 William Tell Oveture, by Gioachino Rossini 88421 is not an address  
> 114801 Western East Avenue Apt. B32, Funky Township CA 12345 is an address  
> 1 Infinite Loop, Cupertino CA 12345-1234 is an address  
> 420 time! is not an address  



回答2:


Maybe you want to have a look at pypostal. pypostal are the official Python bindings to libpostal.

With the Examples from Mike Bethany i made this little Example:

from postal.parser import parse_address

addresses = [
    "420 Fanboy Lane, Cupertino CA 12345",
    "1829 William Tell Oveture, by Gioachino Rossini 88421",
    "114801 Western East Avenue Apt. B32, Funky Township CA 12345",
    "1 Infinite Loop, Cupertino CA 12345-1234",
    "420 time!",
]

for address in addresses:
    print parse_address(address)
    print "*" * 60

>     [(u'420', u'house_number'), (u'fanboy lane', u'road'), (u'cupertino', u'city'), (u'ca', u'state'), (u'12345', u'postcode')]
>     ************************************************************
>     [(u'1829', u'house_number'), (u'william tell', u'road'), (u'oveture by gioachino', u'house'), (u'rossini', u'road'), (u'88421',
> u'postcode')]
>     ************************************************************
>     [(u'114801', u'house_number'), (u'western east avenue apt.', u'road'), (u'b32', u'postcode'), (u'funky', u'road'), (u'township',
> u'city'), (u'ca', u'state'), (u'12345', u'postcode')]
>     ************************************************************
>     [(u'1', u'house_number'), (u'infinite loop', u'road'), (u'cupertino', u'city'), (u'ca', u'state'), (u'12345-1234',
> u'postcode')]
>     ************************************************************
>     [(u'420', u'house_number'), (u'time !', u'house')]
>     ************************************************************



回答3:


\d{1,4}( \w+){1,3},( \w+){1,3} [A-Z]{2}

Not fully tested, but should work. Just use it with your favorite function from re (e.g. re.findall. Assumptions:

  1. A house number can be between 1 and 4 digits long
  2. 1-3 words follow a house number, and they're all separated by spaces
  3. City name is 1-3 words (needs to match Cupertino, Los Angeles, and San Luis Obispo)



回答4:


Okay, Based on the very helpful Mike Bethany and Rafe Kettler responses ( thanks!) I get this REGEX works for python and ruby. /[0-9]{1,4} (.), (.) [a-zA-Z]{2} [0-9]{5}/

Ruby Code - Results in 12 Argonaut Lane, Lexington MA 02478

myregex=Regexp.new(/[0-9]{1,4} (.*), (.*) [a-zA-Z]{2} [0-9]{5}(-[0-9]{4})?/)

print "We're Having a pizza party at 12 Argonaut Lane, Lexington MA 02478 Come join the party!".match(myregex)

Python Code - doesnt work quite the same, but this is the base code.

import re
myregex = re.compile(r'/[0-9]{1,4} (.*), (.*) [a-zA-Z]{2} [0-9]{5}(-[0-9]{4})?/')
search = myregex.findall("We're Having a pizza party at 12 Argonaut Lane, Lexington MA 02478 Come join the party!")



回答5:


Here's what I used:

(\d{1,10}( \w+){1,10}( ( \w+){1,10})?( \w+){1,10}[,.](( \w+){1,10}(,)? [A-Z]{2}( [0-9]{5})?)?) 

It's not perfect and doesn't match edge cases but it works for most regularly typed addresses and partial addresses.

It finds addresses in text such as

Hi! I'm at 12567 Some St. Fairfax, VA. Come get me!

some text 12567 Some St. is my home

something else 123 My Street Drive, Fairfax VA 22033

Hope this helps someone




回答6:


As stated, addresses are very free-form. Rather than the REGEX approach how about a service that provides accurate, standardized address data? I work for SmartyStreets, where we provide an API that does this very thing. One simple GET request and you've got your address parsed for you. Try this python sample out (you'll need to start a trial):

https://github.com/smartystreets/smartystreets-python-sdk/blob/master/examples/us_street_single_address_example.py



来源:https://stackoverflow.com/questions/4542941/street-address-search-in-a-string-python-or-ruby

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!