Using read_excel with converters for reading Excel file into Pandas DataFrame results in a numeric column of object type

前端 未结 3 863
囚心锁ツ
囚心锁ツ 2020-12-21 16:16

I am reading this Excel file United Nations Energy Indicators using the code snippet here:

def convert_energy(energy):
    if isinstance(energy, float):
             


        
3条回答
  •  清歌不尽
    2020-12-21 17:06

    One of the values for energy in your excel file is a string "..." and when in your coverter function, you just return energy as is if it is a string datatype.

    Therefore you are getting a string returned along with your numbers which then changes the dtype of you column to 'object.

    You could try something like this:

    def convert_energy(energy):
        if energy == "...":
            return np.nan
        elif isinstance(energy, float):
            return float(energy*1000000)
        else:
            return float(energy)
    
    df = pd.read_excel('http://unstats.un.org/unsd/environment/excel_file_tables/2013/Energy%20Indicators.xls', 
                       skiprows=17, skip_footer=38, 
                       usecols=[2,3,4,5], na_values=['...'], 
                       names=['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable'],
                       converters={1: convert_energy}).set_index('Country')
    
    df.info()
    

    Output:

    
    Index: 227 entries, Afghanistan to Zimbabwe
    Data columns (total 3 columns):
    Energy Supply               222 non-null float64
    Energy Supply per Capita    222 non-null float64
    % Renewable                 227 non-null float64
    dtypes: float64(3)
    memory usage: 6.2+ KB
    

提交回复
热议问题