Python Pandas 'apply' returns series; can't convert to dataframe

匿名 (未验证) 提交于 2019-12-03 00:48:01

问题:

OK, I'm at half-wit's end. I'm geocoding a dataframe with geopy. I've written a simple function to take an input - country name - and return the latitude and longitude. I use apply to run the function and it returns a Pandas series object. I can't seem to convert it to a dataframe. I'm sure I'm missing something obvious, but I'm new to python and still RTFMing. BTW, the geocoder function works great.

# Import libraries  import os  import pandas as pd  import numpy as np from geopy.geocoders import Nominatim  def locate(x):     geolocator = Nominatim()     # print(x) # debug     try:         #Get geocode         location = geolocator.geocode(x, timeout=8, exactly_one=True)         lat = location.latitude         lon = location.longitude     except:         #didn't work for some reason that I really don't care about         lat = np.nan         lon = np.nan    #  print(lat,lon) #debug     return lat,  lon # Note: also tried return { 'LAT': lat, 'LON': lon }  df_geo_in = df_addr.drop_duplicates(['COUNTRY']).reset_index()    #works perfectly df_geo_in['LAT'], df_geo_in['LON']  = df_geo_in.applymap(locate)  # error: returns more than 2 values - default index + column with results 

I also tried

df_geo_in['LAT','LON'] = df_geo_in.applymap(locate) 

I get a single dataframe with no index and a single colume with the series in it.

I've tried a number of other methods, including 'applymap' :

source_cols = ['LAT','LON']  new_cols = [str(x) for x in source_cols]  df_geo_in = df_addr.drop_duplicates(['COUNTRY']).set_index(['COUNTRY'])  df_geo_in[new_cols] = df_geo_in.applymap(locate) 

which returned an error after a long time:

ValueError: Columns must be same length as key

I've also tried manually converting the series to a dataframe using the df.from_dict(df_geo_in) method without success.

The goal is to geocode 166 unique countries, then join it back to the 188K addresses in df_addr. I'm trying to be pandas-y in my code and not write loops if possible. But I haven't found the magic to convert series into dataframes and this is the first time I've tried to use apply.

Thanks in advance - ancient C programmer

回答1:

I'm assuming that df_geo is a df with a single column so I believe the following should work:

change:

return lat,  lon 

to

return pd.Series([lat,  lon]) 

then you should be able to assign like so:

df_geo_in[['LAT', 'LON']] = df_geo_in.apply(locate) 

What you tried to do was assign the result of applymap to 2 new columns which is incorrect here as applymap is designed to work on every element in a df so unless the lhs has the same expected shape this won't give the desired result.

Your latter method is also incorrect because you drop the duplicate countries and then expect this to assign every country geolocation back but the shape is different.

It is probably quicker for large df's to create the geolocation non-duplicated df's and then merge this back to your larger df like so:

geo_lookup = df_addr.drop_duplicates(['COUNTRY']) geo_lookup[['LAT','LNG']] = geo_lookup['COUNTRY'].apply(locate) df_geo_in.merge(geo_lookup, left_on='COUNTRY', right_on='COUNTRY', how='left') 

this will create a df with non duplicated countries with geo location addresses and then we perform a left merge back to the master df.



回答2:

Always easier to test with some sample data, but please try the following zip function to see if it works.

df_geo_in['LAT_LON'] = df_geo_in.applymap(locate)  df_geo_in['LAT'], df_geo_in['LON'] = zip(*df_geo_in.LAT_LON) 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!