How to convert OpenDocument spreadsheets to a pandas DataFrame?

前端未结

关注

 11  1621

日久生厌 2020-12-23 19:40

The Python library pandas can read Excel spreadsheets and convert them to a pandas.DataFrame with pandas.read_excel(file) command. Under the hood,

11条回答

庸人自扰 (楼主)

2020-12-23 19:55

You can read ODF (Open Document Format .ods) documents in Python using the following modules:

odfpy / read-ods-with-odfpy
ezodf
pyexcel / pyexcel-ods
py-odftools
simpleodspy

Using ezodf, a simple ODS-to-DataFrame converter could look like this:

import pandas as pd
import ezodf

doc = ezodf.opendoc('some_odf_spreadsheet.ods')

print("Spreadsheet contains %d sheet(s)." % len(doc.sheets))
for sheet in doc.sheets:
    print("-"*40)
    print("   Sheet name : '%s'" % sheet.name)
    print("Size of Sheet : (rows=%d, cols=%d)" % (sheet.nrows(), sheet.ncols()) )

# convert the first sheet to a pandas.DataFrame
sheet = doc.sheets[0]
df_dict = {}
for i, row in enumerate(sheet.rows()):
    # row is a list of cells
    # assume the header is on the first row
    if i == 0:
        # columns as lists in a dictionary
        df_dict = {cell.value:[] for cell in row}
        # create index for the column headers
        col_index = {j:cell.value for j, cell in enumerate(row)}
        continue
    for j, cell in enumerate(row):
        # use header instead of column index
        df_dict[col_index[j]].append(cell.value)
# and convert to a DataFrame
df = pd.DataFrame(df_dict)

P.S.

ODF spreadsheet (*.ods files) support has been requested on the pandas issue tracker: https://github.com/pydata/pandas/issues/2311, but it is still not implemented.
ezodf was used in the unfinished PR9070 to implement ODF support in pandas. That PR is now closed (read the PR for a technical discussion), but it is still available as an experimental feature in this pandas fork.
there are also some brute force methods to read directly from the XML code (here)

0 讨论(0)

查看其它11个回答