Can Pandas read and modify a single Excel file worksheet (tab) without modifying the rest of the file?

前端 未结 6 924
清歌不尽
清歌不尽 2020-12-08 21:19

Many spreadsheets have formulas and formatting that Python tools for reading and writing Excel files cannot faithfully reproduce. That means that any file I want to create p

相关标签:
6条回答
  • 2020-12-08 21:53

    I'm adding an answer that uses openpyxl. As of version 2.5, you can preserve charts in existing files (further details on the issue are available here).

    For demonstration purposes, I create an xlsx file using pandas following the OPs guidelines. The tab named 'Sheet2' has formulas that reference 'Sheet3' and contains a chart.

    import pandas as pd
    
    df = pd.DataFrame({'col_a': [1,2,3],
                      'col_b': [4,5,6]})
    
    writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter')
    df.to_excel(writer, sheet_name='Sheet1', index=False)
    workbook=writer.book
    worksheet = writer.sheets['Sheet1']
    df.head(0).to_excel(writer, sheet_name='Sheet2', index=False)
    workbook=writer.book
    worksheet = writer.sheets['Sheet2']
    for i in range(2, len(df) + 2):
        worksheet.write_formula('A%d' % (i), "=Sheet3!A%d" % (i))
        worksheet.write_formula('B%d' % (i), "=Sheet3!B%d" % (i))
    chart = workbook.add_chart({'type': 'column'})
    
    chart.add_series({'values': '=Sheet2!$A$2:$A$4'})
    chart.add_series({'values': '=Sheet2!$B$2:$B$4'})
    
    worksheet.insert_chart('A7', chart)
    
    df.to_excel(writer, sheet_name='Sheet3', index=False)
    df.to_excel(writer, sheet_name='Sheet4', index=False)
    
    writer.save()
    

    Expected test.xlsx after running the code above:

    Then if we run the code below, using openpyxl, we can modify the data in 'Sheet3' while preserving formulas and chart in 'Sheet2' and the updated data is now in this file.

    from openpyxl import load_workbook
    
    wb = load_workbook('test.xlsx')
    ws = wb['Sheet3']
    ws['B2'] = 7
    ws['B3'] = 8
    ws['B4'] = 9
    wb.save('test.xlsx')
    

    Expected test.xlsx after running the second block of code:

    0 讨论(0)
  • 2020-12-08 21:56

    I'm 90% confident the answer to "can pandas do this" is no. Posting a negative is tough, because there always might be something clever that I've missed, but here's a case:

    Possible interface engines are xlrd/xlwt/xlutils, openpyxl, and xlsxwriter. None will work for your purposes, as xlrd/wt don't support all formulae, xlsxwriter can't modify existing xlsx files, and openpyxl loses images and charts.

    Since I often need to do this, I've taken to only writing simple output to a separate file and then calling the win32api directly to copy the data between the workbooks while preserving all of my colleague's shiny figures. It's annoying, because it means I have to do it under Windows instead of *nix, but it works.

    If you're working under Windows, you could do something similar. (I wonder if it makes sense to add a native insert option using this approach to help people in this situation, or if we should simply post a recipe.)


    P.S.: This very problem has annoyed me enough from time to time that I've thought of learning enough of the modern Excel format to add support for this to one of the libraries.

    P.P.S.: But since ignoring things you're not handling and returning them unmodified seems easy enough, the fact that no one seems to support it makes me think there are some headaches, and where Redmond's involved I'm willing to believe it. @john-machin would know the details, if he's about..

    0 讨论(0)
  • 2020-12-08 22:02

    As far as I know Pandas does not do that by itself.

    I wrote some small utility library pandasxltable (based on openpyxl) in order to facilitate the interaction between a excel template and pandas data-frames. The library allows you to fetch as data-frame and update Excel Data Tables (not really a tab but part of it)from dataframe.

    0 讨论(0)
  • 2020-12-08 22:08

    Required: call path to exist excels file.

    Input: List string.

    Output: append row.

    from datetime import datetime,timedelta
    from openpyxl import load_workbook,Workbook
           
       def write_log_excels(status):
                   try:
                       # Point
                       log_list = ["1","2","3","4","5,"6","7","8",  "9"]
                       date_n = datetime.now()
                       date_n = date_n.strftime("%Y-%m-%d %H:%M:%S")
                       sdate = date_n
       
                       wk = load_workbook('filename.xlsx')
                       wh = wk.active
                       lenth = wh.max_row
                       # wk.close()
                       pl = log_list
                       if lenth == 0:
                           # ws = Workbook()
                           # wb = ws.active
                           wh['A1'] = 'TITLE1'
                           wh['B1'] = 'TITLE2'
                           wh['C1'] = 'TITLE3'
                           wh['D1'] = 'TITLE4'
                           wh['E1'] = 'TITLE5'
                           wh['F1'] = 'TITLE6'
                           wh['G1'] = 'TITLE7'
                           wh['H1'] = 'TITLE8'
                           wh['I1'] = 'TITLE9'
                           lenth = 1
                       if pl is not None:
                           w = lenth + 1
                           wh['A{}'.format(w)] =  pl[0]
                           wh['B{}'.format(w)] =  pl[1]
                           wh['C{}'.format(w)] =  pl[2]
                           wh['D{}'.format(w)] =  pl[3]
                           wh['E{}'.format(w)] =  pl[4]
                           wh['F{}'.format(w)] =  pl[5]
                           wh['G{}'.format(w)] =  pl[3]
                           wh['H{}'.format(w)] =  pl[4]
                           wh['I{}'.format(w)] =  pl[5]
                       wk.save('filename.xlsx')
           
                       log_list.clear()
                   except Exception as e:
                       print('write_log_excels :' + str(e))
           write_log_excels('')
    

    Or using this for auto create col,row.

    def work_sheet(wsheet):
        data_sheet = []
        col = [] #column in sheet
        for c in range(wsheet.max_column):
            #got alphabels with max_(len)_column found in worksheet
            col.append(string.ascii_uppercase[c])
    
        for r in range(2,wsheet.max_row + 1):
            data_row = []
            for c in range(len(col)):
                #got values exactly with "sheet[colum-row]"
                data = wsheet['{}{}'.format(col[c],r)].value
                data_row.append(data)
            data_sheet.append(data_row)
        return data_sheet
    
    0 讨论(0)
  • 2020-12-08 22:09

    if you're talking about 'sheets' as 'tabs', then it is possible to modify just one of the tabs by accessing the particular one using the parse(sheet_name) function.

    an example is here: Reading an Excel file in python using pandas

    to write back to excel, (while controlling the sheets) use the to_excel function, here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_excel.html

    0 讨论(0)
  • 2020-12-08 22:10

    I had a similar question regarding the interaction between excel and python (in particular, pandas), and I was referred to this question.

    Thanks to some pointers by stackoverflow community, I found a package called xlwings that seems to cover a lot of the functionalities HaPsantran required.

    To use the OP's example:

    Working with an existing excel file, you can drop an anchor in the data block (Sheet3) you want to import to pandas by naming it in excel and do:

    # opened an existing excel file
    

    wb = Workbook(Existing_file)

    # Find in the excel file a named cell and reach the boundary of the cell block (boundary defined by empty column / row) and read the cell 
    

    df = Range(Anchor).table.value

    # import pandas and manipulate the data block
    df = pd.DataFrame(df) # into Pandas DataFrame
    df['sum'] = df.sum(axis= 1)
    
    # write back to Sheet3
    Range(Anchor).value = df.values
    

    tested that this implementation didn't temper existing formula in the excel file

    Let me know if this solves your problem and if there's anything I can help.

    Big kudos to the developer of xlwings, they made this possible.


    Below is an update to my earlier answer after further question from @jamzsabb, and to reflect a changed API after xlwings updated to >= 0.9.0.

    import xlwings as xw
    import pandas as pd
    target_df = xw.Range('A7').options(pd.DataFrame, expand='table').value # only do this if the 'A7' cell (the cell within area of interest) is in active worksheet
    #otherwise do:
    #sht = xw.Book(r'path to your xlxs file\name_of_file.xlsx`).sheets['name of sheet']
    #target_df = sht.Range('A7').options(pd.DataFrame, expand='table').value # you can also change 'A7' to any name that you've given to a cell like 'interest_table`
    
    0 讨论(0)
提交回复
热议问题