问题
This is the code:
xls = open_workbook('data.xls')
In return:
File "/home/woles/P2/fin/fin/apps/data_container/importer.py", line 16, in import_data
xls = open_workbook('data.xlsx')
File "/home/woles/P2/fin/local/lib/python2.7/site-packages/xlrd/__init__.py", line 435, in open_workbook
ragged_rows=ragged_rows,
File "/home/woles/P2/fin/local/lib/python2.7/site-packages/xlrd/book.py", line 91, in open_workbook_xls
biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
File "/home/woles/P2/fin/local/lib/python2.7/site-packages/xlrd/book.py", line 1230, in getbof
bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
File "/home/woles/P2/fin/local/lib/python2.7/site-packages/xlrd/book.py", line 1224, in bof_error
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '\r\n<html>'
The file is not damaged, I can open it with Excel, LibreOffice.
回答1:
Try to open it with pandas:
import pandas as pd
data=pd.read_html(filename.xls)
Or try any other html python parser.
That's not a proper excel file, but an html readable with excel.
回答2:
I had same error resolved just now, first step id check changing the file to text and notice html content , than did some modification to ensure that once I save it as HTML and open in browser than table should be visible. Than Beautiful Soup and pandas helped me to get excel output....
check below lines if may help..
import pandas as pd
import os
import shutil
import html5lib
import requests
from bs4 import BeautifulSoup
import re
import time
shutil.copy('donloaded.xls','changed.html')
shutil.copy('changed.html','txt_output.txt')
time.sleep(2)
txt = open('txt_output.txt','r').read()
# Modify the text to ensure the data display in html page
txt = str(txt).replace('<style> .text { mso-number-format:\@; } </script>','')
# Add head and body if it is not there in HTML text
txt_with_head = '<html><head></head><body>'+txt+'</body></html>'
# Save the file as HTML
html_file = open('output.html','w')
html_file.write(txt_with_head)
# Use beautiful soup to read
url = r"C:\Users\hitesh kumar\PycharmProjects\OEM ML\output.html"
page = open(url)
soup = BeautifulSoup(page.read(), features="lxml")
my_table = soup.find("table",attrs={'border': '1'})
frame = pd.read_html(str(my_table))[0]
print(frame.head())
frame.to_excel('testoutput.xlsx',sheet_name='sheet1', index=False)
来源:https://stackoverflow.com/questions/23994362/xlrd-reading-xls-xlrderror-unsupported-format-or-corrupt-file-expected-bof-re