I have some *.xls(excel 2003) files, and I want to convert those files into xlsx(excel 2007).
I use the uno python package, when I save the documents, I can set the
I required a simple solution to convert couple of xlx
to xlsx
format. There are plenty of answers here, but they are doing some "magic" that I do not completely understand.
A simple solution was given by chfw, but not quite complete.
Use pip to install
pip install pyexcel-cli pyexcel-xls pyexcel-xlsx
All the styling and macros will be gone, but the information is intact.
pyexcel transcode your-file-in.xls your-new-file-out.xlsx
for file in *.xls; do; echo "Transcoding $file"; pyexcel transcode "$file" "${file}x"; done;
Here is my solution, without considering fonts, charts and images:
$ pip install pyexcel pyexcel-xls pyexcel-xlsx
Then do this::
import pyexcel as p
p.save_book_as(file_name='your-file-in.xls',
dest_file_name='your-new-file-out.xlsx')
If you do not need a program, you could install one additinal package pyexcel-cli::
$ pip install pyexcel-cli
$ pyexcel transcode your-file-in.xls your-new-file-out.xlsx
The transcoding procedure above uses xlrd and openpyxl.
The answer by Ray helped me a lot, but for those who search a simple way to convert all the sheets from a xls to a xlsx, I made this Gist:
import xlrd
from openpyxl.workbook import Workbook as openpyxlWorkbook
# content is a string containing the file. For example the result of an http.request(url).
# You can also use a filepath by calling "xlrd.open_workbook(filepath)".
xlsBook = xlrd.open_workbook(file_contents=content)
workbook = openpyxlWorkbook()
for i in xrange(0, xlsBook.nsheets):
xlsSheet = xlsBook.sheet_by_index(i)
sheet = workbook.active if i == 0 else workbook.create_sheet()
sheet.title = xlsSheet.name
for row in xrange(0, xlsSheet.nrows):
for col in xrange(0, xlsSheet.ncols):
sheet.cell(row=row, column=col).value = xlsSheet.cell_value(row, col)
# The new xlsx file is in "workbook", without iterators (iter_rows).
# For iteration, use "for row in worksheet.rows:".
# For range iteration, use "for row in worksheet.range("{}:{}".format(startCell, endCell)):".
You can find the xlrd lib here and the openpyxl here (you must download xlrd in your project for Google App Engine for example).
The Answer from Ray was clipping the first row and last column of the data. Here is my modified solution (for python3):
def open_xls_as_xlsx(filename):
# first open using xlrd
book = xlrd.open_workbook(filename)
index = 0
nrows, ncols = 0, 0
while nrows * ncols == 0:
sheet = book.sheet_by_index(index)
nrows = sheet.nrows+1 #bm added +1
ncols = sheet.ncols+1 #bm added +1
index += 1
# prepare a xlsx sheet
book1 = Workbook()
sheet1 = book1.get_active_sheet()
for row in range(1, nrows):
for col in range(1, ncols):
sheet1.cell(row=row, column=col).value = sheet.cell_value(row-1, col-1) #bm added -1's
return book1
I'm improve performance for @Jackypengyu method.
ragged_rows=True
(http://xlrd.readthedocs.io/en/latest/api.html#xlrd.sheet.Sheet.row_slice)Merged cells will be converted too.
Convert same 12 files in same order:
Original:
0:00:01.958159
0:00:02.115891
0:00:02.018643
0:00:02.057803
0:00:01.267079
0:00:01.308073
0:00:01.245989
0:00:01.289295
0:00:01.273805
0:00:01.276003
0:00:01.293834
0:00:01.261401
Improved:
0:00:00.774101
0:00:00.734749
0:00:00.741434
0:00:00.744491
0:00:00.320796
0:00:00.279045
0:00:00.315829
0:00:00.280769
0:00:00.316380
0:00:00.289196
0:00:00.347819
0:00:00.284242
def cvt_xls_to_xlsx(*args, **kw):
"""Open and convert XLS file to openpyxl.workbook.Workbook object
@param args: args for xlrd.open_workbook
@param kw: kwargs for xlrd.open_workbook
@return: openpyxl.workbook.Workbook
You need -> from openpyxl.utils.cell import get_column_letter
"""
book_xls = xlrd.open_workbook(*args, formatting_info=True, ragged_rows=True, **kw)
book_xlsx = Workbook()
sheet_names = book_xls.sheet_names()
for sheet_index in range(len(sheet_names)):
sheet_xls = book_xls.sheet_by_name(sheet_names[sheet_index])
if sheet_index == 0:
sheet_xlsx = book_xlsx.active
sheet_xlsx.title = sheet_names[sheet_index]
else:
sheet_xlsx = book_xlsx.create_sheet(title=sheet_names[sheet_index])
for crange in sheet_xls.merged_cells:
rlo, rhi, clo, chi = crange
sheet_xlsx.merge_cells(
start_row=rlo + 1, end_row=rhi,
start_column=clo + 1, end_column=chi,
)
def _get_xlrd_cell_value(cell):
value = cell.value
if cell.ctype == xlrd.XL_CELL_DATE:
value = datetime.datetime(*xlrd.xldate_as_tuple(value, 0))
return value
for row in range(sheet_xls.nrows):
sheet_xlsx.append((
_get_xlrd_cell_value(cell)
for cell in sheet_xls.row_slice(row, end_colx=sheet_xls.row_len(row))
))
for rowx in range(sheet_xls.nrows):
if sheet_xls.rowinfo_map[rowx].hidden != 0:
print sheet_names[sheet_index], rowx
sheet_xlsx.row_dimensions[rowx+1].hidden = True
for coly in range(sheet_xls.ncols):
if sheet_xls.colinfo_map[coly].hidden != 0:
print sheet_names[sheet_index], coly
coly_letter = get_column_letter(coly+1)
sheet_xlsx.column_dimensions[coly_letter].hidden = True
return book_xlsx
import sys, os
import win32com.client
directory = 'C:\\Users\\folder\\'
for file in os.listdir(directory):
dot = file.find('.')
end = file[dot:]
OutFile =file[0:dot] + ".xlsx"
App = win32com.client.Dispatch("Excel.Application")
App.Visible = True
workbook= App.Workbooks.Open(file)
workbook.ActiveSheet.SaveAs(OutFile, 51) #51 is for xlsx
workbook.Close(SaveChanges=True)
App.Quit()
Thank you.