Pandas read_excel sometimes creates index even when index_col=None

我怕爱的太早我们不能终老 提交于 2020-04-13 16:54:08

问题


I'm trying to read an excel file into a data frame and I want set the index later, so I don't want pandas to use column 0 for the index values.

By default (index_col=None), it shouldn't use column 0 for the index but I find that if there is no value in cell A1 of the worksheet it will.

Is there any way to over-ride this behaviour (I am loading many sheets that have no value in cell A1)?

This works as expected when test1.xlsx has the value "DATE" in cell A1:

In [19]: pd.read_excel('test1.xlsx')                                             
Out[19]: 
                 DATE         A         B         C
0 2018-01-01 00:00:00  0.766895  1.142639  0.810603
1 2018-01-01 01:00:00  0.605812  0.890286  0.810603
2 2018-01-01 02:00:00  0.623123  1.053022  0.810603
3 2018-01-01 03:00:00  0.740577  1.505082  0.810603
4 2018-01-01 04:00:00  0.335573 -0.024649  0.810603

But when the worksheet has no value in cell A1, it automatically assigns column 0 values to the index:

In [20]: pd.read_excel('test2.xlsx', index_col=None)                             
Out[20]: 
                            A         B         C
2018-01-01 00:00:00  0.766895  1.142639  0.810603
2018-01-01 01:00:00  0.605812  0.890286  0.810603
2018-01-01 02:00:00  0.623123  1.053022  0.810603
2018-01-01 03:00:00  0.740577  1.505082  0.810603
2018-01-01 04:00:00  0.335573 -0.024649  0.810603

This is not what I want.

Desired result: Same as first example (but with 'Unnamed' as the column label perhaps).

Documentation says

index_col : int, list of int, default None.

Column (0-indexed) to use as the row labels of the DataFrame. Pass None if there is no such column.


回答1:


The issue that you're describing matches a known pandas bug. This bug was fixed in the recent pandas 0.24.0 release:

Bug Fixes

  • Bug in read_excel() in which index_col=None was not being respected and parsing index columns anyway (GH18792, GH20480)



回答2:


I was facing essentially the same issue since last couple of days.

I have an excel file that also have the first column header as Blank. So when it is read it gets read as an index.

I tried many options but below code works using skiprows instead of the header option. Interestingly skiprows uses the "Unnamed: 0" naming patterns for columns that does not have a header where as using the header option it did not work. We are using pandas version 0.20.1 :

df = pd.read_excel( "ABC.xlsx"  , dtype=str, sheetname='Supply', skiprows =6, usecols = mycols )

 df.columns
Index([       'Unnamed: 0', 2015-01-01 00:00:00, 2015-02-01 00:00:00,
       2015-03-01 00:00:00, 2015-04-01 00:00:00, 2015-05-01 00:00:00,
       2015-06-01 00:00:00, 2015-07-01 00:00:00, 2015-08-01 00:00:00,
       2015-09-01 00:00:00,
       ...
       ],
      dtype='object', length=120)

The documentation does not provide any more info on this. But above work-around can save your day.




回答3:


You can also use

index_col=0

instead of

index_col = None


来源:https://stackoverflow.com/questions/54487818/pandas-read-excel-sometimes-creates-index-even-when-index-col-none

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!