missing values in pandas column multiindex

穿精又带淫゛_ 提交于 2020-07-10 07:41:08

问题


I am reading with pandas excel sheets like this one:

using

df = pd.read_excel('./question.xlsx', sheet_name = None, header = [0,1])

which results in multiindex dataframe with multiindex.

What poses a problem here is that the empty fields are filled by default with 'Title', whereas I would prefer to use a distinct label. I cannot skip the first row since I am dealing with bigger data frames where the first and the second rows contain repeating labels (hence the use of the multiindex).

Your help will be much appreciated.


回答1:


Assuming that you want to have empty strings instead of repeating the first label, you can read the 2 lines and build the MultiIndex directly:

df1 = pd.read_excel('./question.xlsx', header = None, nrows=2).fillna('')
index = pd.MultiIndex.from_arrays(df1.values)

it gives:

MultiIndex([('Title',        '#'),
            (     '',    'Price'),
            (     '', 'Quantity')],
           )

By the way, if you wanted a different label for empty fields, you can just use it as the parameter for fillna.

Then, you just read the remaining data, and set the index by hand:

df1 = pd.read_excel('./question.xlsx', header = None, skiprows=2)
df1.columns = index


来源:https://stackoverflow.com/questions/62658750/missing-values-in-pandas-column-multiindex

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!