Is there a slower or more controlled alternative to .apply()?

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-11 04:31:44

问题


So this may seem like an odd question, but I have a pandas DataFrame with addresses in it, that I want to geocode so I can get the latitude and longitude.

I have code that works using .apply() thanks to this very helpful thread (new column with coordinates using geopy pandas), but my problem is that all of the open APIs have strict limits to how many requests per second they allow, and also requests per day.

I haven't been able to find any way to throttle my code so match the limits of the APIs. My DF has 25K rows, but I've only been able to successfully geocode if I create a subset of it with up to 5 rows.

I don't have a lot of experience with python and pandas, but in SAS the DATA steps iterate one row at a time, so I could have a sleep command that would throttle the requests. What would be the best way to implement something like that with python/pandas?

EDIT: So based on the answers so far, I wanted to confirm, my code would change from: df_small['city_coord'] = df_small['Address'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))
to:

df_small = df_clean[:5]
def f(x, delay=1):
# run your code    
sleep(delay)
return geolocator.geocode(x)

df_small['city_coord'] = df_small['Address'].apply(f).apply(lambda x: (x.latitude, x.longitude))

回答1:


To iterate with a delay, you can use df.iterrows() and time.sleep():

from time import sleep

for row in df.iterrows():
    # run your code
    sleep(1) # how many seconds to wait

Or you can just put time.sleep() within the apply function itself (as @RafaelC suggests in the comments):

def f(x, delay=1):
    # run your code
    sleep(delay)

df.apply(f)


来源:https://stackoverflow.com/questions/49737760/is-there-a-slower-or-more-controlled-alternative-to-apply

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!