Extracting City, State and Country from Raw address string [closed]

£可爱£侵袭症+ 提交于 2021-02-20 04:12:17

问题


Given a raw string input

1600 Divisadero St
San Francisco, CA 94115
b/t Post St & Sutter St 
Lower Pacific Heights

I want to extract

City:San Francisco
state:California or CA
Country:USA

I'll be parsing millions of addresses and using a Paid API is not feasible

I'm planning to use a Named Entity Recognizer but i'm unable to find a vast quantity of training data to ideally cover any location

Is there an opensource project out there which i may use?


回答1:


OpenStreetMap's geocoding solution Nominatim can be downloaded and set up on your own machine. This is an extremely tedious and time consuming process. You will need 500GB of free disk space, O(10s) of days to do the indexing, but at the end of it, you will have a full fledged geocoder on your own machine which should be able to handle your current needs and many more future ones.
If you go down this route, I recommend first trying out their example web api's to see if the quality is acceptable or not.
Totally worth looking into spending money and getting Google or Bing geocoder instead.




回答2:


@adi92's Answer is the best choice here, but requires a very beefy machine with many many cores and huge RAM to index the entire database. For those requiring lesser computation www.geonames.org is pretty comprehensive enough for city, state, country only.



来源:https://stackoverflow.com/questions/31452180/extracting-city-state-and-country-from-raw-address-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!