How to obtain sheet names from XLS files without loading the whole file?

后端 未结 6 647
独厮守ぢ
独厮守ぢ 2020-11-29 22:36

I\'m currently using pandas to read an Excel file and present its sheet names to the user, so he can select which sheet he would like to use. The problem is that the files a

6条回答
  •  北海茫月
    2020-11-29 22:55

    From my research with the standard / popular libs this hasn't been implemented as of 2020 for xlsx / xls but you can do this for xlsb. Either way these solutions should give you vast performance improvements. for xls, xlsx, xlsb.

    Below was benchmarked on a ~10Mb xlsx, xlsb file.

    xlsx, xls

    from openpyxl import load_workbook
    
    def get_sheetnames_xlsx(filepath):
        wb = load_workbook(filepath, read_only=True, keep_links=False)
        return wb.sheetnames
    

    Benchmarks: ~ 14x speed improvement

    # get_sheetnames_xlsx vs pd.read_excel
    225 ms ± 6.21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    3.25 s ± 140 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    

    xlsb

    from pyxlsb import open_workbook
    
    def get_sheetnames_xlsb(filepath):
      with open_workbook(filepath) as wb:
         return wb.sheets
    

    Benchmarks: ~ 56x speed improvement

    # get_sheetnames_xlsb vs pd.read_excel
    96.4 ms ± 1.61 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
    5.36 s ± 162 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    

    Notes:

    • This is a good resource - http://www.python-excel.org/
    • xlrd is no longer maintained as of 2020

提交回复
热议问题