问题
I'm trying to read an excel file into a data frame and I want set the index later, so I don't want pandas to use column 0 for the index values.
By default (index_col=None), it shouldn't use column 0 for the index but I find that if there is no value in cell A1 of the worksheet it will.
Is there any way to over-ride this behaviour (I am loading many sheets that have no value in cell A1)?
This works as expected when test1.xlsx has the value "DATE" in cell A1:
In [19]: pd.read_excel('test1.xlsx')
Out[19]:
DATE A B C
0 2018-01-01 00:00:00 0.766895 1.142639 0.810603
1 2018-01-01 01:00:00 0.605812 0.890286 0.810603
2 2018-01-01 02:00:00 0.623123 1.053022 0.810603
3 2018-01-01 03:00:00 0.740577 1.505082 0.810603
4 2018-01-01 04:00:00 0.335573 -0.024649 0.810603
But when the worksheet has no value in cell A1, it automatically assigns column 0 values to the index:
In [20]: pd.read_excel('test2.xlsx', index_col=None)
Out[20]:
A B C
2018-01-01 00:00:00 0.766895 1.142639 0.810603
2018-01-01 01:00:00 0.605812 0.890286 0.810603
2018-01-01 02:00:00 0.623123 1.053022 0.810603
2018-01-01 03:00:00 0.740577 1.505082 0.810603
2018-01-01 04:00:00 0.335573 -0.024649 0.810603
This is not what I want.
Desired result: Same as first example (but with 'Unnamed' as the column label perhaps).
Documentation says
index_col : int, list of int, default None.
Column (0-indexed) to use as the row labels of the DataFrame. Pass None if there is no such column.
回答1:
The issue that you're describing matches a known pandas bug. This bug was fixed in the recent pandas 0.24.0 release:
Bug Fixes
- Bug in read_excel() in which
index_col=Nonewas not being respected and parsing index columns anyway (GH18792, GH20480)
回答2:
I was facing essentially the same issue since last couple of days.
I have an excel file that also have the first column header as Blank. So when it is read it gets read as an index.
I tried many options but below code works using skiprows instead of the header option. Interestingly skiprows uses the "Unnamed: 0" naming patterns for columns that does not have a header where as using the header option it did not work. We are using pandas version 0.20.1 :
df = pd.read_excel( "ABC.xlsx" , dtype=str, sheetname='Supply', skiprows =6, usecols = mycols )
df.columns
Index([ 'Unnamed: 0', 2015-01-01 00:00:00, 2015-02-01 00:00:00,
2015-03-01 00:00:00, 2015-04-01 00:00:00, 2015-05-01 00:00:00,
2015-06-01 00:00:00, 2015-07-01 00:00:00, 2015-08-01 00:00:00,
2015-09-01 00:00:00,
...
],
dtype='object', length=120)
The documentation does not provide any more info on this. But above work-around can save your day.
回答3:
You can also use
index_col=0
instead of
index_col = None
来源:https://stackoverflow.com/questions/54487818/pandas-read-excel-sometimes-creates-index-even-when-index-col-none