问题
I have downloaded few sales dataset from a SAP application. SAP has automatically converted the data to .XLS file. Whenever I open it using Pandas library I am getting the following error:
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '\xff\xfe\r\x00\n\x00\r\x00'
When I opened the .XLS file using MSEXCEL it is shows a popup saying that the file is corrupt or unsupported extension do you want to continue when I clicked 'Yes' its showing the correct data. When I saved the file again as .xls using msexcel I am able to use it using Pandas.
So, I tried renaming the file using os.rename() but it dint work. I tried opening the file and removing \xff\xfe\r\x00\n\x00\r\x00, but then also it dint work.
The solution is to open MSEXCEL and save the file again as .xls manually, is there any way to automate this. Kindly help.
回答1:
Finally I converted the corrupt .xls to a correct .xls file. The following is the code:
# Changing the data types of all strings in the module at once
from __future__ import unicode_literals
# Used to save the file as excel workbook
# Need to install this library
from xlwt import Workbook
# Used to open to corrupt excel file
import io
filename = r'SALEJAN17.xls'
# Opening the file using 'utf-16' encoding
file1 = io.open(filename, "r", encoding="utf-16")
data = file1.readlines()
# Creating a workbook object
xldoc = Workbook()
# Adding a sheet to the workbook object
sheet = xldoc.add_sheet("Sheet1", cell_overwrite_ok=True)
# Iterating and saving the data to sheet
for i, row in enumerate(data):
# Two things are done here
# Removeing the '\n' which comes while reading the file using io.open
# Getting the values after splitting using '\t'
for j, val in enumerate(row.replace('\n', '').split('\t')):
sheet.write(i, j, val)
# Saving the file as an excel file
xldoc.save('myexcel.xls')
import pandas as pd
df = pd.ExcelFile('myexcel.xls').parse('Sheet1')
No errors.
回答2:
The other way to solve this problem is using win32com.client library:
import win32com.client
import os
o = win32com.client.Dispatch("Excel.Application")
o.Visible = False
filename = os.getcwd() + '/' + 'SALEJAN17.xls'
output = os.getcwd() + '/' + 'myexcel.xlsx'
wb = o.Workbooks.Open(filename)
wb.ActiveSheet.SaveAs(output,51)
In my example you save to .xlsx format but you can save as .xls as well.
来源:https://stackoverflow.com/questions/43985768/python-converting-corrupt-xls-file