AssertionError with pandas when reading excel

问题

I'm trying to read an xlsx file into python using pandas.
I've done this thousands of times before but some reason it is not working with a particular file.

The file is downloaded from another source and I get an AssertionError (see end) when reading with pandas:

df = pandas.read_excel(pathtomyfile, sheetname = "Sheet1")

The variable is defined for the path. The path exists (os.path.exists(path) returns True).

When I copy the contents of the file and paste the values in a new excel doc, this new one will open with the read_excel() method.

When I copy the contents of the file and paste the formatting in a new excel, this new one will open with the read_excel() method.

It doesn't seem to be the values or the formatting.

I am guessing this could be an encoding issue?
Thank you for any help.

    df1 = pandas.read_excel(snap1)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\pandas\io\excel.py", line 163, in read_excel
    io = ExcelFile(io, engine=engine)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\pandas\io\excel.py", line 206, in __init__
    self.book = xlrd.open_workbook(io)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\xlrd\__init__.py", line 422, in open_workbook
    ragged_rows=ragged_rows,
File "C:\Python\python-3.4.4.amd64\lib\site-packages\xlrd\xlsx.py", line 794, in open_workbook_2007_xml
    x12sheet.process_stream(zflo, heading)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\xlrd\xlsx.py", line 531, in own_process_stream
    self_do_row(elem)
File "C:\Python\python-3.4.4.amd64\lib\site-packages\xlrd\xlsx.py", line 597, in do_row
    assert 0 <= self.rowx < X12_MAX_ROWS
AssertionError

回答1:

Just for completeness, I had a similar problem where the row number of the first row was incorrect, I fixed my problem by changing xlsx-file with code adapted from this answer

def repair_broken_excelfile(zipfname, *filenames, new_name=None):
    # https://stackoverflow.com/a/4653863/1562285
    import tempfile
    import zipfile
    import shutil
    import os
    tempdir = tempfile.mkdtemp()
    try:
        tempname = os.path.join(tempdir, 'new.zip')
        with zipfile.ZipFile(zipfname, 'r') as zipread:
            with zipfile.ZipFile(tempname, 'w') as zipwrite:
                for item in zipread.infolist():
                    print('fn: ' + item.filename)
                    if item.filename not in filenames:
                        data = zipread.read(item.filename)

                        zipwrite.writestr(item, data)
                    else:

                        data = zipread.read(item.filename)
                        data = data.replace(b'<row r="0" spans="">', b'<row r="1" spans="">')
                        zipwrite.writestr(item, data)
                        pass
        if not new_name:
            new_name = zipfname
        shutil.move(tempname, new_name)
    finally:
        shutil.rmtree(tempdir)

Apparently there is a fix underway in xlrd

回答2:

In my case, I was using xlrd package to read excel and I got the same error of Assertion. Open your xlrd package from site-packages, and from that open sheet.py (https://github.com/python-excel/xlrd/blob/master/xlrd/sheet.py)

Find this code in sheet.py

    if self.biff_version >= 80:
        self.utter_max_rows = 65536
    else:
        self.utter_max_rows = 16384

Convert the above one into...

 #if self.biff_version >= 80:
 self.utter_max_rows = 65536
 #else:
 #      self.utter_max_rows = 16384

Now try to run your program... Problem will be solved..:)

回答3:

Have encountered the same problem, I save file under xml format: "Save as type: XML Spreadsheet 2003" on window. Then I open the file and save as xlsx format. The new file no longer gives error message.

来源：https://stackoverflow.com/questions/37486558/assertionerror-with-pandas-when-reading-excel

标签

python

excel

pandas

XLSX

xlrd