Is openpyxl the fastest package to use to read big xlsx files?

隐身守侯 提交于 2019-12-04 18:11:56

I benchmarked the optimised reader on my 2009 MacBook at about 20s for 1 million cells with numbers. I would expect a slight hit with your code because of the indirection for cells and the pattern matching (compile the pattern outside of the loop) but would think the speed would still be acceptable. CSV will, of course, be faster if you can get it easily.

Be interested to know your numbers.

The fastest way to read an xlsx sheet.

56mb file with over 500k rows and 4 sheets took 6s to proceed.

import zipfile
from bs4 import BeautifulSoup

paths = []
mySheet = 'Sheet Name'
filename = 'xlfile.xlsx'
file = zipfile.ZipFile(filename, "r")

for name in file.namelist():
    if name == 'xl/workbook.xml':
        data = BeautifulSoup(file.read(name), 'html.parser')
        sheets = data.find_all('sheet')
        for sheet in sheets:
            paths.append([sheet.get('name'), 'xl/worksheets/sheet' + str(sheet.get('sheetid')) + '.xml'])

for path in paths:
    if path[0] == mySheet:
        with file.open(path[1]) as reader:
            for row in reader:
                print(row)  ## do what ever you want with your data
        reader.close()

Enjoy and happy coding.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!