how to convert xls to xlsx

前端 未结 14 1298
生来不讨喜
生来不讨喜 2020-11-27 03:58

I have some *.xls(excel 2003) files, and I want to convert those files into xlsx(excel 2007).

I use the uno python package, when I save the documents, I can set the

相关标签:
14条回答
  • 2020-11-27 04:41

    Simple solution

    I required a simple solution to convert couple of xlx to xlsx format. There are plenty of answers here, but they are doing some "magic" that I do not completely understand.

    A simple solution was given by chfw, but not quite complete.

    Install dependencies

    Use pip to install

    pip install pyexcel-cli pyexcel-xls pyexcel-xlsx
    

    Execute

    All the styling and macros will be gone, but the information is intact.

    For single file

    pyexcel transcode your-file-in.xls your-new-file-out.xlsx
    

    For all files in the folder, one liner

    for file in *.xls; do; echo "Transcoding $file"; pyexcel transcode "$file" "${file}x"; done;
    
    0 讨论(0)
  • 2020-11-27 04:43

    Here is my solution, without considering fonts, charts and images:

    $ pip install pyexcel pyexcel-xls pyexcel-xlsx
    

    Then do this::

    import pyexcel as p
    
    p.save_book_as(file_name='your-file-in.xls',
                   dest_file_name='your-new-file-out.xlsx')
    

    If you do not need a program, you could install one additinal package pyexcel-cli::

    $ pip install pyexcel-cli
    $ pyexcel transcode your-file-in.xls your-new-file-out.xlsx
    

    The transcoding procedure above uses xlrd and openpyxl.

    0 讨论(0)
  • 2020-11-27 04:48

    The answer by Ray helped me a lot, but for those who search a simple way to convert all the sheets from a xls to a xlsx, I made this Gist:

    import xlrd
    from openpyxl.workbook import Workbook as openpyxlWorkbook
    
    # content is a string containing the file. For example the result of an http.request(url).
    # You can also use a filepath by calling "xlrd.open_workbook(filepath)".
    
    xlsBook = xlrd.open_workbook(file_contents=content)
    workbook = openpyxlWorkbook()
    
    for i in xrange(0, xlsBook.nsheets):
        xlsSheet = xlsBook.sheet_by_index(i)
        sheet = workbook.active if i == 0 else workbook.create_sheet()
        sheet.title = xlsSheet.name
    
        for row in xrange(0, xlsSheet.nrows):
            for col in xrange(0, xlsSheet.ncols):
                sheet.cell(row=row, column=col).value = xlsSheet.cell_value(row, col)
    
    # The new xlsx file is in "workbook", without iterators (iter_rows).
    # For iteration, use "for row in worksheet.rows:".
    # For range iteration, use "for row in worksheet.range("{}:{}".format(startCell, endCell)):".
    

    You can find the xlrd lib here and the openpyxl here (you must download xlrd in your project for Google App Engine for example).

    0 讨论(0)
  • 2020-11-27 04:52

    The Answer from Ray was clipping the first row and last column of the data. Here is my modified solution (for python3):

    def open_xls_as_xlsx(filename):
    # first open using xlrd
    book = xlrd.open_workbook(filename)
    index = 0
    nrows, ncols = 0, 0
    while nrows * ncols == 0:
        sheet = book.sheet_by_index(index)
        nrows = sheet.nrows+1   #bm added +1
        ncols = sheet.ncols+1   #bm added +1
        index += 1
    
    # prepare a xlsx sheet
    book1 = Workbook()
    sheet1 = book1.get_active_sheet()
    
    for row in range(1, nrows):
        for col in range(1, ncols):
            sheet1.cell(row=row, column=col).value = sheet.cell_value(row-1, col-1) #bm added -1's
    
    return book1
    
    0 讨论(0)
  • 2020-11-27 04:53

    I'm improve performance for @Jackypengyu method.

    • XLSX: working per row, not per cell (http://openpyxl.readthedocs.io/en/default/api/openpyxl.worksheet.worksheet.html#openpyxl.worksheet.worksheet.Worksheet.append)
    • XLS: read whole row excluding empty tail, see ragged_rows=True (http://xlrd.readthedocs.io/en/latest/api.html#xlrd.sheet.Sheet.row_slice)

    Merged cells will be converted too.

    Results

    Convert same 12 files in same order:

    Original:

    0:00:01.958159
    0:00:02.115891
    0:00:02.018643
    0:00:02.057803
    0:00:01.267079
    0:00:01.308073
    0:00:01.245989
    0:00:01.289295
    0:00:01.273805
    0:00:01.276003
    0:00:01.293834
    0:00:01.261401
    

    Improved:

    0:00:00.774101
    0:00:00.734749
    0:00:00.741434
    0:00:00.744491
    0:00:00.320796
    0:00:00.279045
    0:00:00.315829
    0:00:00.280769
    0:00:00.316380
    0:00:00.289196
    0:00:00.347819
    0:00:00.284242
    

    Solution

    def cvt_xls_to_xlsx(*args, **kw):
        """Open and convert XLS file to openpyxl.workbook.Workbook object
    
        @param args: args for xlrd.open_workbook
        @param kw: kwargs for xlrd.open_workbook
        @return: openpyxl.workbook.Workbook
    
    
        You need -> from openpyxl.utils.cell import get_column_letter
        """
    
        book_xls = xlrd.open_workbook(*args, formatting_info=True, ragged_rows=True, **kw)
        book_xlsx = Workbook()
    
        sheet_names = book_xls.sheet_names()
        for sheet_index in range(len(sheet_names)):
            sheet_xls = book_xls.sheet_by_name(sheet_names[sheet_index])
    
            if sheet_index == 0:
                sheet_xlsx = book_xlsx.active
                sheet_xlsx.title = sheet_names[sheet_index]
            else:
                sheet_xlsx = book_xlsx.create_sheet(title=sheet_names[sheet_index])
    
            for crange in sheet_xls.merged_cells:
                rlo, rhi, clo, chi = crange
    
                sheet_xlsx.merge_cells(
                    start_row=rlo + 1, end_row=rhi,
                    start_column=clo + 1, end_column=chi,
                )
    
            def _get_xlrd_cell_value(cell):
                value = cell.value
                if cell.ctype == xlrd.XL_CELL_DATE:
                    value = datetime.datetime(*xlrd.xldate_as_tuple(value, 0))
    
                return value
    
            for row in range(sheet_xls.nrows):
                sheet_xlsx.append((
                    _get_xlrd_cell_value(cell)
                    for cell in sheet_xls.row_slice(row, end_colx=sheet_xls.row_len(row))
                ))
    
            for rowx in range(sheet_xls.nrows):
                if sheet_xls.rowinfo_map[rowx].hidden != 0:
                    print sheet_names[sheet_index], rowx
                    sheet_xlsx.row_dimensions[rowx+1].hidden = True
            for coly in range(sheet_xls.ncols):
                if sheet_xls.colinfo_map[coly].hidden != 0:
                    print sheet_names[sheet_index], coly
                    coly_letter = get_column_letter(coly+1)
                    sheet_xlsx.column_dimensions[coly_letter].hidden = True
    
        return book_xlsx
    
    0 讨论(0)
  • 2020-11-27 04:55

    Try using win32com application. Install it in your machine.

    import sys, os
    import win32com.client
    directory = 'C:\\Users\\folder\\'
    for file in os.listdir(directory):
        dot = file.find('.')
        end = file[dot:]
        OutFile =file[0:dot] + ".xlsx"
        App = win32com.client.Dispatch("Excel.Application")
        App.Visible = True
        workbook= App.Workbooks.Open(file)
        workbook.ActiveSheet.SaveAs(OutFile, 51)   #51 is for xlsx 
        workbook.Close(SaveChanges=True)
        App.Quit()
    

    Thank you.

    0 讨论(0)
提交回复
热议问题