问题
I download a XLS file from the web using selenium.
I tried many options I found in stack-overflow and other websites to read the XLS file :
import pandas as pd
df = pd.read_excel('test.xls') # Read XLS file
Expected "little-endian" marker, found b'\xff\xfe'
And
df = pd.ExcelFile('test.xls').parse('Sheet1') # Read XLSX file
Expected "little-endian" marker, found b'\xff\xfe'
And again
from xlrd import open_workbook
book = open_workbook('test.xls')
CompDocError: Expected "little-endian" marker, found b'\xff\xfe'
I have tried different encoding: utf-8, ANSII, utf_16_be, utf16 I have even tried to get the encoding of the file from notepad or other applications.
Type of file : Microsoft Excel 97-2003 Worksheet (.xls) I can open the file with Excel without any issue. What's frustrating is that if I open the file with excel and just press save I then can read the file with of the previous python command.
I would be really grateful if someone could provide me other ideas I could try. I need to open this file with a python script only.
Thanks, Max
Solution(Somewhat messy but simple) that could potentially work for any type of Excel file :
Called VBA from python to Open and save the file in Excel. Excel "clean-up" the file and then Python is able to read it with any read Excel type function
Solution inspired by @Serge Ballesta and @John Y comments.
## Open a file in Excel and save it to correct the encoding error
import win32com.client
import pandas
downloadpath="c:\\firefox_downloads\\"
filename="myfile.xls"
xl=win32com.client.Dispatch("Excel.Application")
xl.Application.DisplayAlerts = False # disables Excel pop up message (for saving the file)
wb = xl.Workbooks.Open(Filename=downloadpath+filename)
wb.SaveAs(downloadpath+filename)
wb.Close
xl.Application.DisplayAlerts = True # enables Excel pop up message for saving the file
df = pandas.ExcelFile(downloadpath+filename).parse('Sheet1') # Read XLSX file
Thank you all!
回答1:
What does pd mean?? What
pandas is made for data science. In my opinion, you have to use openpyxl (read and write only xlsx) or xlwt/xlrd (read xls... and write only xls).
from xlrd import open_workbook
book = open_workbook(<math file>)
sheet =....
It has several examples with this on Internet...
来源:https://stackoverflow.com/questions/49197793/error-little-endian-reading-a-xls-file-with-python