Openpyxl corrupts xlsx on save. Even when no changes were made

牧云@^-^@ 提交于 2021-02-11 14:28:49

问题


TL;DR;

  • Using Openpyxl to save changes to a large excel file results in a corrupted xlsx file
  • The Excel file is made of several tabs with graphs, formulae, and images, and tables.
  • Powershell script can save edits to the xlsx file with no issues.
  • I can read the cell values from the excel file with Openpyxl, I am also able to edit & save the xlsx file manually.
  • Excel file is unprotected.
  • All errors and code snippets have been provided below.

I'm unable to add data to an excel file that one of our teams is using. The excel file is fairly big(at +3MB), has several sheets, contains formulas and graphs and also has images.

Thankfully the sheet I need to enter data to has none of that, however, I found that when I try to save the workbook, I end up with these errors:

Traceback (most recent call last):
  File "test.py", line 5, in <module>
    wb.save("new.xlsx")
  File "C:\Python3\lib\site-packages\openpyxl\workbook\workbook.py", line 392, in save
    save_workbook(self, filename)
  File "C:\Python3\lib\site-packages\openpyxl\writer\excel.py", line 293, in save_workbook
    writer.save()
  File "C:\Python3\lib\site-packages\openpyxl\writer\excel.py", line 275, in save
    self.write_data()
  File "C:\Python3\lib\site-packages\openpyxl\writer\excel.py", line 78, in write_data
    self._write_charts()
  File "C:\Python3\lib\site-packages\openpyxl\writer\excel.py", line 124, in _write_charts
    self._archive.writestr(chart.path[1:], tostring(chart._write()))
  File "C:\Python3\lib\site-packages\openpyxl\chart\_chart.py", line 134, in _write
    return cs.to_tree()
  File "C:\Python3\lib\site-packages\openpyxl\chart\chartspace.py", line 193, in to_tree
    tree = super(ChartSpace, self).to_tree()
  File "C:\Python3\lib\site-packages\openpyxl\descriptors\serialisable.py", line 154, in to_tree
    node = obj.to_tree(child_tag)
  File "C:\Python3\lib\site-packages\openpyxl\descriptors\serialisable.py", line 154, in to_tree
    node = obj.to_tree(child_tag)
  File "C:\Python3\lib\site-packages\openpyxl\chart\plotarea.py", line 135, in to_tree
    return super(PlotArea, self).to_tree(tagname)
  File "C:\Python3\lib\site-packages\openpyxl\descriptors\serialisable.py", line 146, in to_tree
    for node in nodes:
  File "C:\Python3\lib\site-packages\openpyxl\descriptors\sequence.py", line 105, in to_tree
    el = v.to_tree(namespace=namespace)
  File "C:\Python3\lib\site-packages\openpyxl\chart\_chart.py", line 107, in to_tree
    return super(ChartBase, self).to_tree(tagname, idx)
  File "C:\Python3\lib\site-packages\openpyxl\descriptors\serialisable.py", line 146, in to_tree
    for node in nodes:
  File "C:\Python3\lib\site-packages\openpyxl\descriptors\sequence.py", line 39, in to_tree
    el = v.to_tree(tagname, idx)
  File "C:\Python3\lib\site-packages\openpyxl\chart\series.py", line 170, in to_tree
    return super(Series, self).to_tree(tagname)
  File "C:\Python3\lib\site-packages\openpyxl\descriptors\serialisable.py", line 154, in to_tree
    node = obj.to_tree(child_tag)
  File "C:\Python3\lib\site-packages\openpyxl\descriptors\serialisable.py", line 154, in to_tree
    node = obj.to_tree(child_tag)
AttributeError: 'str' object has no attribute 'to_tree'

This is the code I used to perform the "save as" procedure, so never mind about adding data and whatnot, the save action corrupts the file:

from openpyxl import load_workbook

wb=load_workbook("Production Monitoring Script.xlsx")
ws=wb['Prod Perf Script Data']
wb.save("new.xlsx")

I tried an alternative solution with Powershell and it worked.

$xl=New-Object -ComObject Excel.Application
$wb=$xl.WorkBooks.Open('<path here>\Production Monitoring Script.xlsx')
$ws=$wb.WorkSheets.item(1)
$xl.Visible=$true

$ws.Cells.Item(7, 618)=50

$wb.SaveAs('<path here>\New.xlsx')
$xl.Quit()

It was able to save the value "50" in that cell.


回答1:


As discussed in comments on original post: some graphics and other items are not supported by openpyxl, even if they are in worksheets not modified by your code. This is not a full workaround, but works when the unsupported objects are in other worksheets only.

I made an example .xlsx workbook with two worksheets, 'TWC' and 'UV240 Results'. This code assumes that any worksheet whose title ends in 'Results' contains the unsupported images, and creates two temporary files - imageoutput contains the unsupported images, and outputtemp contains the worksheets that may be modified without corruption by openpyxl. Then they're stitched together at the end.

It may be a inefficient in parts; please edit or comment with improvements!

import os
import shutil
import win32com.client

from openpyxl import load_workbook

name = 'spreadsheet.xlsx'
outputfile = 'output.xlsx'
outputtemp = 'outputtemp.xlsx'

shutil.copyfile(name, 'output.xlsx')
wb = load_workbook('output.xlsx')
ws = wb['TWC']

# TWC doesn't have images. Anything ending with 'Results' has unsupported images etc

# Create new file with only openpyxl-unsupported worksheets
imageworksheets = [ws if ws.title.endswith('Results') else '' for ws in wb.worksheets]
if [ws for ws in wb if ws.title != 'TWC']:
    imageoutput = 'output2.xlsx'
    imagefilewritten = False
    while not imagefilewritten:
        try:
            shutil.copy(name, imageoutput)
        except PermissionError as error:
            # Catch an exception here - I usually have a GUI function
            pass
        else:
            imagefilewritten = True

    excel = win32com.client.Dispatch('Excel.Application')
    excel.Visible = False
    imagewb = excel.Workbooks.Open(os.path.join(os.getcwd(), imageoutput))
    excel.DisplayAlerts = False

    for i, ws in enumerate(imageworksheets[::-1]): # Go backwards to avoid reindexing
        if not ws:
            wsindex = len(imageworksheets) - i
            imagewb.Worksheets(wsindex).Delete()

    imagefileupdated = False
    while not imagefileupdated:
        try:
            imagewb.Save()
            imagewb.Close(SaveChanges = True)
            print('Temp image workbook saved.')
        except PermissionError as error:
            # Catch exception
            pass
        else:
            imagefileupdated = True

# Remove the unsupported worksheets in openpyxl
for ws in wb.worksheets:
    if ws in imageworksheets:
        wb.remove(ws)
wb.save(outputtemp)
print('Temp output workbook saved.')

''' Do your desired openpyxl manipulations on the remaining supported worksheet '''

# Merge the outputtemp and imageoutput into outputfile
wb1 = excel.Workbooks.Open(os.path.join(os.getcwd(), outputtemp))
wb2 = excel.Workbooks.Open(os.path.join(os.getcwd(), imageoutput))

for ws in wb1.Sheets:
    ws.Copy(wb2.Sheets(1))

wb2.SaveAs(os.path.join(os.getcwd(), outputfile))
wb1.Close(SaveChanges = True)
wb2.Close(SaveChanges = True)
print(f'Output workbook saved as {outputfile}.')

excel.Visible = True
excel.DisplayAlerts = True


来源:https://stackoverflow.com/questions/65563297/openpyxl-corrupts-xlsx-on-save-even-when-no-changes-were-made

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!