AttributeError: 'ElementTree' object has no attribute 'getiterator' when trying to import excel file

天涯浪子 提交于 2021-02-07 11:29:35

问题


This is my code. I've just installed jupyterlab and i've added the excel file in there. Same error if i change the path to where the file is on my system. I can't seem to find anyone who had the same problem when simply importing an excel file as a dataframe.

The excel file is a 3x26 table with studentnr, course, result columns that have values like 101-105, A-D, 1.0-9.9 respectively. Maybe the problem lies with the excel file?

Either way i have no idea how to fix this.

import pandas as pd
import numpy as np
df = pd.read_excel('student-results.xlsx')

This is the error I'm getting:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-6-9d38e4d56bbe> in <module>
      1 import pandas as pd
      2 import numpy as np
----> 3 df = pd.read_excel('student-results.xlsx')

c:\python\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
    294                 )
    295                 warnings.warn(msg, FutureWarning, stacklevel=stacklevel)
--> 296             return func(*args, **kwargs)
    297 
    298         return wrapper

c:\python\lib\site-packages\pandas\io\excel\_base.py in read_excel(io, sheet_name, header, names, index_col, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols)
    302 
    303     if not isinstance(io, ExcelFile):
--> 304         io = ExcelFile(io, engine=engine)
    305     elif engine and engine != io.engine:
    306         raise ValueError(

c:\python\lib\site-packages\pandas\io\excel\_base.py in __init__(self, path_or_buffer, engine)
    865         self._io = stringify_path(path_or_buffer)
    866 
--> 867         self._reader = self._engines[engine](self._io)
    868 
    869     def __fspath__(self):

c:\python\lib\site-packages\pandas\io\excel\_xlrd.py in __init__(self, filepath_or_buffer)
     20         err_msg = "Install xlrd >= 1.0.0 for Excel support"
     21         import_optional_dependency("xlrd", extra=err_msg)
---> 22         super().__init__(filepath_or_buffer)
     23 
     24     @property

c:\python\lib\site-packages\pandas\io\excel\_base.py in __init__(self, filepath_or_buffer)
    351             self.book = self.load_workbook(filepath_or_buffer)
    352         elif isinstance(filepath_or_buffer, str):
--> 353             self.book = self.load_workbook(filepath_or_buffer)
    354         elif isinstance(filepath_or_buffer, bytes):
    355             self.book = self.load_workbook(BytesIO(filepath_or_buffer))

c:\python\lib\site-packages\pandas\io\excel\_xlrd.py in load_workbook(self, filepath_or_buffer)
     35             return open_workbook(file_contents=data)
     36         else:
---> 37             return open_workbook(filepath_or_buffer)
     38 
     39     @property

c:\python\lib\site-packages\xlrd\__init__.py in open_workbook(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows)
    128         if 'xl/workbook.xml' in component_names:
    129             from . import xlsx
--> 130             bk = xlsx.open_workbook_2007_xml(
    131                 zf,
    132                 component_names,

c:\python\lib\site-packages\xlrd\xlsx.py in open_workbook_2007_xml(zf, component_names, logfile, verbosity, use_mmap, formatting_info, on_demand, ragged_rows)
    810     del zflo
    811     zflo = zf.open(component_names['xl/workbook.xml'])
--> 812     x12book.process_stream(zflo, 'Workbook')
    813     del zflo
    814     props_name = 'docprops/core.xml'

c:\python\lib\site-packages\xlrd\xlsx.py in process_stream(self, stream, heading)
    264         self.tree = ET.parse(stream)
    265         getmethod = self.tag2meth.get
--> 266         for elem in self.tree.iter() if Element_has_iter else self.tree.getiterator():
    267             if self.verbosity >= 3:
    268                 self.dump_elem(elem)

AttributeError: 'ElementTree' object has no attribute 'getiterator'

回答1:


You could try to use an argument engine="openpyxl". It helped me to resolve the same problem.




回答2:


The error occurs when pandas is used in python3.9+ because the code xml.etree.ElementTree.Element.getiterator() which had been deprecated with a warning previously, has now been removed.

A workaround is to install another engine openpyxl to read the excel file, and replace your code which reads the excel file.

First,

pip3 install openpyxl

Then, instead of pd.read_excel('student-results.xlsx'), write pd.read_excel('student-results.xlsx', engine='openpyxl')

Reference: Python bug tracker




回答3:


I got the same error with xlrd (1.2.0) or xlrd3 (1.0.0) without pandas, but with Python 3.9. The following may interest those looking for an explanation:

It only happened when defusedxml was available (in that case, xlrd will use it). But it could be worked around, without changing any of the involved libraries:

import xlrd
xlrd.xlsx.ensure_elementtree_imported(False, None)
xlrd.xlsx.Element_has_iter = True

The second line ensures that Element_has_iter will not be reset when opening a workbook, so that it remains to True - as set in the 3rd line. When this is done, xlrd uses iter instead of crashing on the missing getiterator.

That said, I agree that moving to openpyxl in place of xlrd is a cleaner solution, at least untill xlrd or xlrd3 possibly gets fixed. Openpyxl appears to be more actively developed. In my case, I have to adapt direct calls to those libraries, it is probably more work than just typing openpyxl instead of xlrd to tell pandas about what it should do, but I'll consider it.

So ok with @corridda, use openpyxl, and others are right about the cause, but maybe this explains a little more on the causes.




回答4:


This showed up for me when I upgraded to Python 3.9. The difference seems to be related to a combination of the compression format of xlsx files and as the deprecation of an iterator function.

For xlsx documents I need to specify the engine='openpyxl' keyword argument when opening.

This is not the case for csv or xls documents.

Install openpyxl

$ pip3 install openpyxl

Open xlsx and xls files with different engines.

from pathlib import Path
import pandas as pd

file_path = Path(file_name)

if file_path.suffix == '.xlsx':
    df = pd.read_excel(file_name, engine='openpyxl')

elif file_path.suffix == '.xls':
    df = pd.read_excel(file_name)

else:
    # handle other file types
    pass



回答5:


You can check issue description here. You run Python 3.9 and use xlrd library which call removed getiterator method. You should find part of code in the file from your "Trace back" and replace getiterator with iter.

It might require to run Python file and replace that calls couple times.




回答6:


Follow the below steps:

  1. Go to /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/xlrd
  2. open the xlsx.py in any text editor and replace the two getiterator() method with iter().
  3. Reload your Jupiter notebook. It will work.



回答7:


To avoid messing with xlrd, you can also save your Excel file with .xls extension instead of .xlsx.



来源:https://stackoverflow.com/questions/64264563/attributeerror-elementtree-object-has-no-attribute-getiterator-when-trying

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!