Timeout error in Python geopy geocoder

微笑、不失礼 提交于 2021-02-07 22:01:36

问题


I am a relatively new Python user and am attempting to use a function to return the latitude and longitude for a city and country using the "geopy" module. I have had errors because my city was misspelled which I have managed to catch. The trouble I am now having is that I am encountering a timeout error. I have read this question Geopy: catch timeout error and adjusted my timeout parameter accordingly. However it now runs for varying lengths of time before I get a timeout error. I have tried running it over faster networks and it works to some degree. The trouble is that I need to do this for 100k rows and the maximum rows it has iterated before timing out is 20k. Any help/advice on how to solve this problem is greatly appreciated.

import os
from geopy.geocoders import Nominatim
os.getcwd() #check current working directory
os.chdir("C:\Users\Philip\Documents\HDSDA1\Project\Global Terrorism Database")

#import file as a csv
import csv
gtd=open("gtd_original.csv","r")
csv_f=csv.reader(gtd)
outf=open("r_ready.csv","wb")
writer=csv.writer(outf,dialect='excel')
for row in csv_f:
    if row[13] in ("","NA") or row[14] in ("","NA"):   
        lookup = row[12] + "," + row[8]  # creates a city,country
        geolocator = Nominatim()
        location = geolocator.geocode(lookup, timeout = None) #looks up the city/country on maps
        try:
            location.latitude
        except:
            lookup = row[8]
            location = geolocator.geocode(lookup) 
        row[13] = location.latitude
        row[14] = location.longitude
    writer.writerow(row)      
gtd.close()
outf.close()

回答1:


I expect that you exceded usage policy for Nominatim service (http://wiki.openstreetmap.org/wiki/Nominatim_usage_policy). Try to put a sleep of 1 sec between requests and cache the results, probable are a lot of duplicates.

Sleeping part:

from time import sleep
### your code
row[14] = location.longitude
sleep(1) # after last line in if

Caching:

coords = {}
if coords.has_key([row[8], row[12] ]):
    row[13] , row[14] = coords[ [ row[8], row[12] ] ]
else:
    #geolocate

Update

performance: 1 request/sec --> 3600 reqs/hour --> 36K requests/10h

import os
from time import sleep
from geopy.geocoders import Nominatim
os.getcwd() #check current working directory
os.chdir("C:\Users\Philip\Documents\HDSDA1\Project\Global Terrorism Database")

#import file as a csv
import csv
gtd=open("gtd_original.csv","r")
csv_f=csv.reader(gtd)
outf=open("r_ready.csv","wb")
writer=csv.writer(outf,dialect='excel')
coords = {}
for row in csv_f:
    if row[13] in ("","NA") or row[14] in ("","NA"):   
        lookup = row[12] + "," + row[8]  # creates a city,country

        if coords.has_key( (row[8], row[12]) ):   ## test if result is already cached
            row[13] , row[14] = coords[ (row[8], row[12]) ]
        else:    
            geolocator = Nominatim()
            location = geolocator.geocode(lookup, timeout = None) #looks up the city/country on maps
            try:
                location.latitude
            except:
                lookup = row[8]
                location = geolocator.geocode(lookup) 
            row[13] = location.latitude
            row[14] = location.longitude
            coords[ (row[8], row[12]) ] = (row[13] , row[14])  # cache the new coords
            sleep(1) # sleep for 1 sec (required by Nominatim usage policy)

    writer.writerow(row)      
gtd.close()
outf.close()



回答2:


you can use GeocoderTimedOut

here is an example function which can help you

import geopy
from geopy.geocoders import Nominatim
from geopy.exc import GeocoderTimedOut

def do_geocode(address):
    geopy = Nominatim()
    try:
        return geopy.geocode(address)
    except GeocoderTimedOut:
        return do_geocode(address)

its pretty simple if timeout occur then it will retry. Hope it helps



来源:https://stackoverflow.com/questions/30218394/timeout-error-in-python-geopy-geocoder

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!