Error (little-endian) reading a XLS file with python

时光总嘲笑我的痴心妄想 提交于 2020-01-05 09:06:20

问题


I download a XLS file from the web using selenium.

I tried many options I found in stack-overflow and other websites to read the XLS file :

import pandas as pd
df = pd.read_excel('test.xls') # Read XLS file
Expected "little-endian" marker, found b'\xff\xfe'

And

df = pd.ExcelFile('test.xls').parse('Sheet1') # Read XLSX file
Expected "little-endian" marker, found b'\xff\xfe'

And again

from xlrd import open_workbook
book = open_workbook('test.xls') 
CompDocError: Expected "little-endian" marker, found b'\xff\xfe'

I have tried different encoding: utf-8, ANSII, utf_16_be, utf16 I have even tried to get the encoding of the file from notepad or other applications.

Type of file : Microsoft Excel 97-2003 Worksheet (.xls) I can open the file with Excel without any issue. What's frustrating is that if I open the file with excel and just press save I then can read the file with of the previous python command.

I would be really grateful if someone could provide me other ideas I could try. I need to open this file with a python script only.

Thanks, Max

Solution(Somewhat messy but simple) that could potentially work for any type of Excel file :

Called VBA from python to Open and save the file in Excel. Excel "clean-up" the file and then Python is able to read it with any read Excel type function

Solution inspired by @Serge Ballesta and @John Y comments.

## Open a file in Excel and save it to correct the encoding error 
import win32com.client
import pandas

downloadpath="c:\\firefox_downloads\\"
filename="myfile.xls"

xl=win32com.client.Dispatch("Excel.Application")
xl.Application.DisplayAlerts = False # disables Excel pop up message (for saving the file)
wb = xl.Workbooks.Open(Filename=downloadpath+filename)
wb.SaveAs(downloadpath+filename)
wb.Close
xl.Application.DisplayAlerts = True  # enables Excel pop up message for saving the file

df = pandas.ExcelFile(downloadpath+filename).parse('Sheet1') # Read XLSX file

Thank you all!


回答1:


What does pd mean?? What

pandas is made for data science. In my opinion, you have to use openpyxl (read and write only xlsx) or xlwt/xlrd (read xls... and write only xls).

from xlrd import open_workbook
book = open_workbook(<math file>)
sheet =.... 

It has several examples with this on Internet...



来源:https://stackoverflow.com/questions/49197793/error-little-endian-reading-a-xls-file-with-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!