Python Loop through Excel sheets, place into one df

前端 未结 1 1326
予麋鹿
予麋鹿 2020-12-09 05:16

I have an excel file foo.xlsx with about 40 sheets sh1, sh2, etc. Each sheet has the format:

area      cnt   name\\npa         


        
1条回答
  •  抹茶落季
    2020-12-09 05:37

    UPDATE as of 2019-09-09:

    use sheet_name for v0.25.1 instead of sheetname


    The read_excel method of pandas lets you read all sheets in at once if you set the keyword parameter sheetname=None. This returns a dictionary - the keys are the sheet names, and the values are the sheets as dataframes.

    Using this, we can simply loop through the dictionary and:

    1. Add an extra column to the dataframes containing the relevant sheetname
    2. Use the rename method to rename our columns - by using a lambda, we simply take the final entry of the list obtained by splitting each column name any time there is a new line. If there is no new line, the column name is unchanged.
    3. Append to the "full table"

    Once this is done, we reset the index and all should be well. Note: if you have parties present on one sheet but not others, this will still work but will fill any missing columns for each sheet with NaN.

    import pandas as pd
    
    sheets_dict = pd.read_excel('Book1.xlsx', sheetname=None)
    
    full_table = pd.DataFrame()
    for name, sheet in sheets_dict.items():
        sheet['sheet'] = name
        sheet = sheet.rename(columns=lambda x: x.split('\n')[-1])
        full_table = full_table.append(sheet)
    
    full_table.reset_index(inplace=True, drop=True)
    
    print full_table
    

    Prints:

        area  cnt  party1  party2   sheet
    0  bacon    9       5       5  Sheet1
    1   spam    3       7       5  Sheet1
    2   eggs    2      18       4  Sheet2
    

    0 讨论(0)
提交回复
热议问题