Is there a library for parsing US addresses?

主宰稳场 提交于 2019-12-02 15:38:43
Karl Barker

Pyparsing has a bunch of functionality for parsing street addresses, check out an example for this here: http://pyparsing.wikispaces.com/file/view/streetAddressParser.py

Tyler Hayes

Quite a few of these answers are a few years old now.

The most bulletproof library I've seen recently is usaddress: https://github.com/datamade/usaddress:

Pro tip: when testing addresses in all these libraries, use 1) no commas in your address, 2) multi-word city names preferably with "St." in the name to see if the library can differentiate between "street" and "Saint" (e.g., St. Louis), and 3) improper casing. This combo will typically make even the better parsers fall down.

Check out this Python Package: https://github.com/SwoopSearch/pyaddress

It also allows flexibility if you know enough details about the addresses to be parsed.

That pyparsing library looks very interesting and seems to do a nice job with a variety of examples. And I think that's a more readable alternative to raw regular expressions (which aren't really a good solution for this problem).

Be aware that that kind of solution implies that you will, at some point, be standardizing addresses that aren't valid...they'll just appear valid. If knowing whether an address is in fact, real (and perhaps deliverable) is important to your application then you should be using a USPS-Certified service that using Delivery Point Validation (DPV). I am a developer for SmartyStreets, which provides just such a service, along with SDKs that make integration easy (here's a succinct sample).

The responses come back standardized according to USPS Publication 28. The API is free for low-usage users.

I know this is an old post but someone might find it useful: https://usaddress.readthedocs.io/en/latest/

>>> import usaddress
>>> usaddress.parse('Robie House, 5757 South Woodlawn Avenue, Chicago, IL 60637')
[('Robie', 'BuildingName'),
('House,', 'BuildingName'),
('5757', 'AddressNumber'),
('South', 'StreetNamePreDirectional'),
('Woodlawn', 'StreetName'),
('Avenue,', 'StreetNamePostType'),
('Chicago,', 'PlaceName'),
('IL', 'StateName'),
('60637', 'ZipCode')]

Or:

>>> import usaddress
>>> usaddress.tag('Robie House, 5757 South Woodlawn Avenue, Chicago, IL 60637')
(OrderedDict([
   ('BuildingName', 'Robie House'),
   ('AddressNumber', '5757'),
   ('StreetNamePreDirectional', 'South'),
   ('StreetName', 'Woodlawn'),
   ('StreetNamePostType', 'Avenue'),
   ('PlaceName', 'Chicago'),
   ('StateName', 'IL'),
   ('ZipCode', '60637')]),
'Street Address')

>>> usaddress.tag('State & Lake, Chicago')
(OrderedDict([
   ('StreetName', 'State'),
   ('IntersectionSeparator', '&'),
   ('SecondStreetName', 'Lake'),
   ('PlaceName', 'Chicago')]),
'Intersection')

>>> usaddress.tag('P.O. Box 123, Chicago, IL')
(OrderedDict([
   ('USPSBoxType', 'P.O. Box'),
   ('USPSBoxID', '123'),
   ('PlaceName', 'Chicago'),
   ('StateName', 'IL')]),
'PO Box')

Carefully check your dataset to ensure that this problem hasn't already been handled for you.

I spent a fair amount of time first creating a taxonomy of probably street name ending, using regexp conditionals to try to pluck out the street number from the full address strings and everything and it turned out that the attributes table for my shapefiles had already segmented out these components.

Before you go forward with the process of parsing address strings, which is always a bit of a chore due to the inevitably strange variations (some parcel addresses are for landlocked parcels and have weird addresses, etc), make sure your dataset hasn't already done this for you!!!

There is powerful open-source library libpostal that fits for this use case very nicely. There are bindings to different programming languages. Libpostal is a C library for parsing/normalizing street addresses around the world using statistical NLP and open data. The goal of this project is to understand location-based strings in every language, everywhere.

I have created a simple Docker image with Python binding pypostal you can spin off and try very easily pypostal-docker

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!