xlrd reading xls XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '\r\n<html>'

只愿长相守 提交于 2020-02-24 17:54:13

问题


This is the code:

xls = open_workbook('data.xls')

In return:

File "/home/woles/P2/fin/fin/apps/data_container/importer.py", line 16, in import_data
  xls = open_workbook('data.xlsx')
File "/home/woles/P2/fin/local/lib/python2.7/site-packages/xlrd/__init__.py", line 435,     in open_workbook
ragged_rows=ragged_rows,
File "/home/woles/P2/fin/local/lib/python2.7/site-packages/xlrd/book.py", line 91, in open_workbook_xls
biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
File "/home/woles/P2/fin/local/lib/python2.7/site-packages/xlrd/book.py", line 1230, in getbof
bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
File "/home/woles/P2/fin/local/lib/python2.7/site-packages/xlrd/book.py", line 1224, in bof_error
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '\r\n<html>'

The file is not damaged, I can open it with Excel, LibreOffice.


回答1:


Try to open it with pandas:

import pandas as pd
data=pd.read_html(filename.xls)

Or try any other html python parser.

That's not a proper excel file, but an html readable with excel.




回答2:


I had same error resolved just now, first step id check changing the file to text and notice html content , than did some modification to ensure that once I save it as HTML and open in browser than table should be visible. Than Beautiful Soup and pandas helped me to get excel output....

check below lines if may help..

import pandas as pd
import os
import shutil
import html5lib
import requests
from bs4 import BeautifulSoup
import re
import time

shutil.copy('donloaded.xls','changed.html')
shutil.copy('changed.html','txt_output.txt')
time.sleep(2)

txt = open('txt_output.txt','r').read()

# Modify the text to ensure the data display in html page

txt = str(txt).replace('<style> .text { mso-number-format:\@; } </script>','')

# Add head and body if it is not there in HTML text

txt_with_head = '<html><head></head><body>'+txt+'</body></html>'

# Save the file as HTML

html_file = open('output.html','w')
html_file.write(txt_with_head)

# Use beautiful soup to read

url = r"C:\Users\hitesh kumar\PycharmProjects\OEM ML\output.html"
page = open(url)
soup = BeautifulSoup(page.read(), features="lxml")
my_table = soup.find("table",attrs={'border': '1'})

frame = pd.read_html(str(my_table))[0]
print(frame.head())
frame.to_excel('testoutput.xlsx',sheet_name='sheet1', index=False)


来源:https://stackoverflow.com/questions/23994362/xlrd-reading-xls-xlrderror-unsupported-format-or-corrupt-file-expected-bof-re

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!