How to convert OpenDocument spreadsheets to a pandas DataFrame?

前端 未结 11 1621
日久生厌
日久生厌 2020-12-23 19:40

The Python library pandas can read Excel spreadsheets and convert them to a pandas.DataFrame with pandas.read_excel(file) command. Under the hood,

11条回答
  •  庸人自扰
    2020-12-23 19:55

    You can read ODF (Open Document Format .ods) documents in Python using the following modules:

    • odfpy / read-ods-with-odfpy
    • ezodf
    • pyexcel / pyexcel-ods
    • py-odftools
    • simpleodspy

    Using ezodf, a simple ODS-to-DataFrame converter could look like this:

    import pandas as pd
    import ezodf
    
    doc = ezodf.opendoc('some_odf_spreadsheet.ods')
    
    print("Spreadsheet contains %d sheet(s)." % len(doc.sheets))
    for sheet in doc.sheets:
        print("-"*40)
        print("   Sheet name : '%s'" % sheet.name)
        print("Size of Sheet : (rows=%d, cols=%d)" % (sheet.nrows(), sheet.ncols()) )
    
    # convert the first sheet to a pandas.DataFrame
    sheet = doc.sheets[0]
    df_dict = {}
    for i, row in enumerate(sheet.rows()):
        # row is a list of cells
        # assume the header is on the first row
        if i == 0:
            # columns as lists in a dictionary
            df_dict = {cell.value:[] for cell in row}
            # create index for the column headers
            col_index = {j:cell.value for j, cell in enumerate(row)}
            continue
        for j, cell in enumerate(row):
            # use header instead of column index
            df_dict[col_index[j]].append(cell.value)
    # and convert to a DataFrame
    df = pd.DataFrame(df_dict)
    

    P.S.

    • ODF spreadsheet (*.ods files) support has been requested on the pandas issue tracker: https://github.com/pydata/pandas/issues/2311, but it is still not implemented.

    • ezodf was used in the unfinished PR9070 to implement ODF support in pandas. That PR is now closed (read the PR for a technical discussion), but it is still available as an experimental feature in this pandas fork.

    • there are also some brute force methods to read directly from the XML code (here)

提交回复
热议问题