Insert row into Excel spreadsheet using openpyxl in Python

匿名 (未验证) 提交于 2019-12-03 02:11:02

问题:

I'm looking for the best approach for inserting a row into a spreadsheet using openpyxl.

Effectively, I have a spreadsheet (Excel 2007) which has a header row, followed by (at most) a few thousand rows of data. I'm looking to insert the row as the first row of actual data, so after the header. My understanding is that the append function is suitable for adding content to the end of the file.

Reading the documentation for both openpyxl and xlrd (and xlwt), I can't find any clear cut ways of doing this, beyond looping through the content manually and inserting into a new sheet (after inserting the required row).

Given my so far limited experience with Python, I'm trying to understand if this is indeed the best option to take (the most pythonic!), and if so could someone provide an explicit example. Specifically can I read and write rows with openpyxl or do I have to access cells? Additionally can I (over)write the same file(name)?

回答1:

== Updated to a fully functional version, based on feedback here: groups.google.com/forum/#!topic/openpyxl-users/wHGecdQg3Iw. ==

As the others have pointed out, openpyxl does not provide this functionality, but I have extended the Worksheet class as follows to implement inserting rows. Hope this proves useful to others.

def insert_rows(self, row_idx, cnt, above=False, copy_style=True, fill_formulae=True):     """Inserts new (empty) rows into worksheet at specified row index.      :param row_idx: Row index specifying where to insert new rows.     :param cnt: Number of rows to insert.     :param above: Set True to insert rows above specified row index.     :param copy_style: Set True if new rows should copy style of immediately above row.     :param fill_formulae: Set True if new rows should take on formula from immediately above row, filled with references new to rows.      Usage:      * insert_rows(2, 10, above=True, copy_style=False)      """     CELL_RE  = re.compile("(?P<col>\$?[A-Z]+)(?P<row>\$?\d+)")      row_idx = row_idx - 1 if above else row_idx      def replace(m):         row = m.group('row')         prefix = "$" if row.find("$") != -1 else ""         row = int(row.replace("$",""))         row += cnt if row > row_idx else 0         return m.group('col') + prefix + str(row)      # First, we shift all cells down cnt rows...     old_cells = set()     old_fas   = set()     new_cells = dict()     new_fas   = dict()     for c in self._cells.values():          old_coor = c.coordinate          # Shift all references to anything below row_idx         if c.data_type == Cell.TYPE_FORMULA:             c.value = CELL_RE.sub(                 replace,                 c.value             )             # Here, we need to properly update the formula references to reflect new row indices             if old_coor in self.formula_attributes and 'ref' in self.formula_attributes[old_coor]:                 self.formula_attributes[old_coor]['ref'] = CELL_RE.sub(                     replace,                     self.formula_attributes[old_coor]['ref']                 )          # Do the magic to set up our actual shift             if c.row > row_idx:             old_coor = c.coordinate             old_cells.add((c.row,c.col_idx))             c.row += cnt             new_cells[(c.row,c.col_idx)] = c             if old_coor in self.formula_attributes:                 old_fas.add(old_coor)                 fa = self.formula_attributes[old_coor].copy()                 new_fas[c.coordinate] = fa      for coor in old_cells:         del self._cells[coor]     self._cells.update(new_cells)      for fa in old_fas:         del self.formula_attributes[fa]     self.formula_attributes.update(new_fas)      # Next, we need to shift all the Row Dimensions below our new rows down by cnt...     for row in range(len(self.row_dimensions)-1+cnt,row_idx+cnt,-1):         new_rd = copy.copy(self.row_dimensions[row-cnt])         new_rd.index = row         self.row_dimensions[row] = new_rd         del self.row_dimensions[row-cnt]      # Now, create our new rows, with all the pretty cells     row_idx += 1     for row in range(row_idx,row_idx+cnt):         # Create a Row Dimension for our new row         new_rd = copy.copy(self.row_dimensions[row-1])         new_rd.index = row         self.row_dimensions[row] = new_rd         for col in range(1,self.max_column):             col = get_column_letter(col)             cell = self.cell('%s%d'%(col,row))             cell.value = None             source = self.cell('%s%d'%(col,row-1))             if copy_style:                 cell.number_format = source.number_format                 cell.font      = source.font.copy()                 cell.alignment = source.alignment.copy()                 cell.border    = source.border.copy()                 cell.fill      = source.fill.copy()             if fill_formulae and source.data_type == Cell.TYPE_FORMULA:                 s_coor = source.coordinate                 if s_coor in self.formula_attributes and 'ref' not in self.formula_attributes[s_coor]:                     fa = self.formula_attributes[s_coor].copy()                     self.formula_attributes[cell.coordinate] = fa                 # print("Copying formula from cell %s%d to %s%d"%(col,row-1,col,row))                 cell.value = re.sub(                     "(\$?[A-Z]{1,3}\$?)%d"%(row - 1),                     lambda m: m.group(1) + str(row),                     source.value                 )                    cell.data_type = Cell.TYPE_FORMULA      # Check for Merged Cell Ranges that need to be expanded to contain new cells     for cr_idx, cr in enumerate(self.merged_cell_ranges):         self.merged_cell_ranges[cr_idx] = CELL_RE.sub(             replace,             cr         )  Worksheet.insert_rows = insert_rows 


回答2:

Answering this with the code that I'm now using to achieve the desired result. Note that I am manually inserting the row at position 1, but that should be easy enough to adjust for specific needs. You could also easily tweak this to insert more than one row, and simply populate the rest of the data starting at the relevant position.

Also, note that due to downstream dependencies, we are manually specifying data from 'Sheet1', and the data is getting copied to a new sheet which is inserted at the beginning of the workbook, whilst renaming the original worksheet to 'Sheet1.5'.

EDIT: I've also added (later on) a change to the format_code to fix issues where the default copy operation here removes all formatting: new_cell.style.number_format.format_code = 'mm/dd/yyyy'. I couldn't find any documentation that this was settable, it was more of a case of trial and error!

Lastly, don't forget this example is saving over the original. You can change the save path where applicable to avoid this.

    import openpyxl      wb = openpyxl.load_workbook(file)     old_sheet = wb.get_sheet_by_name('Sheet1')     old_sheet.title = 'Sheet1.5'     max_row = old_sheet.get_highest_row()     max_col = old_sheet.get_highest_column()     wb.create_sheet(0, 'Sheet1')      new_sheet = wb.get_sheet_by_name('Sheet1')      # Do the header.     for col_num in range(0, max_col):         new_sheet.cell(row=0, column=col_num).value = old_sheet.cell(row=0, column=col_num).value      # The row to be inserted. We're manually populating each cell.     new_sheet.cell(row=1, column=0).value = 'DUMMY'     new_sheet.cell(row=1, column=1).value = 'DUMMY'      # Now do the rest of it. Note the row offset.     for row_num in range(1, max_row):         for col_num in range (0, max_col):             new_sheet.cell(row = (row_num + 1), column = col_num).value = old_sheet.cell(row = row_num, column = col_num).value      wb.save(file) 


回答3:

Openpyxl Worksheets have limited functionality when it comes to doing row or column level operations. The only properties a Worksheet has that relates to rows/columns are the properties row_dimensions and column_dimensions, which store "RowDimensions" and "ColumnDimensions" objects for each row and column, respectively. These dictionaries are also used in function like get_highest_row() and get_highest_column().

Everything else operates on a cell level, with Cell objects being tracked in the dictionary, _cells (and their style tracked in the dictionary _styles). Most functions that look like they're doing anything on a row or column level are actually operating on a range of cells (such as the aforementioned append()).

The simplest thing to do would be what you suggested: create a new sheet, append your header row, append your new data rows, append your old data rows, delete the old sheet, then rename your new sheet to the old one. Problems that may be presented with this method is the loss of row/column dimensions attributes and cell styles, unless you specifically copy them, too.

Alternatively, you could create your own functions that insert rows or columns.

I had a large number of very simple worksheets that I needed to delete columns from. Since you asked for explicit examples, I'll provide the function I quickly threw together to do this:

from openpyxl.cell import get_column_letter  def ws_delete_column(sheet, del_column):      for row_num in range(1, sheet.get_highest_row()+1):         for col_num in range(del_column, sheet.get_highest_column()+1):              coordinate = '%s%s' % (get_column_letter(col_num),                                    row_num)             adj_coordinate = '%s%s' % (get_column_letter(col_num + 1),                                        row_num)              # Handle Styles.             # This is important to do if you have any differing             # 'types' of data being stored, as you may otherwise get             # an output Worksheet that's got improperly formatted cells.             # Or worse, an error gets thrown because you tried to copy             # a string value into a cell that's styled as a date.              if adj_coordinate in sheet._styles:                 sheet._styles[coordinate] = sheet._styles[adj_coordinate]                 sheet._styles.pop(adj_coordinate, None)             else:                 sheet._styles.pop(coordinate, None)              if adj_coordinate in sheet._cells:                 sheet._cells[coordinate] = sheet._cells[adj_coordinate]                 sheet._cells[coordinate].column = get_column_letter(col_num)                 sheet._cells[coordinate].row = row_num                 sheet._cells[coordinate].coordinate = coordinate                  sheet._cells.pop(adj_coordinate, None)             else:                 sheet._cells.pop(coordinate, None)          # sheet.garbage_collect() 

I pass it the worksheet that I'm working with, and the column number I want deleted, and away it goes. I know it isn't exactly what you wanted, but I hope this information helped!

EDIT: Noticed someone gave this another vote, and figured I should update it. The co-ordinate system in Openpyxl experienced some changes sometime in the passed couple years, introducing a coordinate attribute for items in _cell. This needs to be edited, too, or the rows will be left blank (instead of deleted), and Excel will throw an error about problems with the file. This works for Openpyxl 2.2.3 (untested with later versions)



回答4:

I took Dallas solution and added support for merged cells:

    def insert_rows(self, row_idx, cnt, above=False, copy_style=True, fill_formulae=True):         skip_list = []         try:             idx = row_idx - 1 if above else row_idx             for (new, old) in zip(range(self.max_row+cnt,idx+cnt,-1),range(self.max_row,idx,-1)):                 for c_idx in range(1,self.max_column):                   col = self.cell(row=1, column=c_idx).column #get_column_letter(c_idx)                   print("Copying %s%d to %s%d."%(col,old,col,new))                   source = self["%s%d"%(col,old)]                   target = self["%s%d"%(col,new)]                   if source.coordinate in skip_list:                       continue                    if source.coordinate in self.merged_cells:                       # This is a merged cell                       for _range in self.merged_cell_ranges:                           merged_cells_list = [x for x in cells_from_range(_range)][0]                           if source.coordinate in merged_cells_list:                               skip_list = merged_cells_list                               self.unmerge_cells(_range)                               new_range = re.sub(str(old),str(new),_range)                               self.merge_cells(new_range)                               break                    if source.data_type == Cell.TYPE_FORMULA:                     target.value = re.sub(                       "(\$?[A-Z]{1,3})%d"%(old),                       lambda m: m.group(1) + str(new),                       source.value                     )                   else:                     target.value = source.value                   target.number_format = source.number_format                   target.font   = source.font.copy()                   target.alignment = source.alignment.copy()                   target.border = source.border.copy()                   target.fill   = source.fill.copy()             idx = idx + 1             for row in range(idx,idx+cnt):                 for c_idx in range(1,self.max_column):                   col = self.cell(row=1, column=c_idx).column #get_column_letter(c_idx)                   #print("Clearing value in cell %s%d"%(col,row))                   cell = self["%s%d"%(col,row)]                   cell.value = None                   source = self["%s%d"%(col,row-1)]                   if copy_style:                     cell.number_format = source.number_format                     cell.font      = source.font.copy()                     cell.alignment = source.alignment.copy()                     cell.border    = source.border.copy()                     cell.fill      = source.fill.copy()                   if fill_formulae and source.data_type == Cell.TYPE_FORMULA:                     #print("Copying formula from cell %s%d to %s%d"%(col,row-1,col,row))                     cell.value = re.sub(                       "(\$?[A-Z]{1,3})%d"%(row - 1),                       lambda m: m.group(1) + str(row),                       source.value                     ) 


回答5:

Unfortunately there isn't really a better way to do in that read in the file, and use a library like xlwt to write out a new excel file (with your new row inserted at the top). Excel doesn't work like a database that you can read and and append to. You unfortunately just have to read in the information and manipulate in memory and write out to what is essentially a new file.



回答6:

Edited Nick's solution, this version takes a starting row, the number of rows to insert, and a filename, and inserts the necessary number of blank rows.

#! python 3  import openpyxl, sys  my_start = int(sys.argv[1]) my_rows = int(sys.argv[2]) str_wb = str(sys.argv[3])  wb = openpyxl.load_workbook(str_wb) old_sheet = wb.get_sheet_by_name('Sheet') mcol = old_sheet.max_column mrow = old_sheet.max_row old_sheet.title = 'Sheet1.5' wb.create_sheet(index=0, title='Sheet')  new_sheet = wb.get_sheet_by_name('Sheet')  for row_num in range(1, my_start):     for col_num in range(1, mcol + 1):         new_sheet.cell(row = row_num, column = col_num).value = old_sheet.cell(row = row_num, column = col_num).value  for row_num in range(my_start + my_rows, mrow + my_rows):     for col_num in range(1, mcol + 1):         new_sheet.cell(row = (row_num + my_rows), column = col_num).value = old_sheet.cell(row = row_num, column = col_num).value  wb.save(str_wb) 


回答7:

Adding an answer applicable to more recent releases, v2.5+, of openpyxl:

There's now an insert_rows() and insert_cols().

insert_rows(idx, amount=1)

Insert row or rows before row==idx



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!