Openpyxl compare cells

岁酱吖の 提交于 2020-01-06 14:01:29

问题


I have 2 sheets with some data (18k rows each) and need to check if value from source.xlsx exists in a target.xlsx file. The rows in the source file should be unique. If the cell from source file exists in the target file (in specific column) then in next column in target file need to fill value from some column which is in the source file. It is quite tricky so example would look like:

target.xlsx

<table><tbody><tr><th>Data</th><th>price</th><th> </th></tr><tr><td>1234grt   </td><td> </td><td> </td></tr><tr><td>7686tyug  </td><td> </td><td> </td></tr><tr><td>9797tyu   </td><td>   </td><td> </td></tr><tr><td>9866yyy   </td><td> </td><td> </td></tr><tr><td>98845r  </td><td> </td><td> </td></tr><tr><td>4567yut  </td><td> </td><td> </td></tr><tr><td>1234grt</td><td> </td><td> </td></tr><tr><td>98845r </td><td> </td><td> </td></tr></tbody></table>

source.xls

<table><tbody><tr><th>Data</th><th>price</th><th> </th></tr><tr><td>98845r    </td><td>$50</td><td> </td></tr><tr><td>7686tyug  </td><td>$67</td><td> </td></tr><tr><td>9797tyu   </td><td>$56</td><td> </td></tr><tr><td>4567yut   </td><td>$67</td><td> </td></tr><tr><td>9866yyy   </td><td>$76</td><td> </td></tr><tr><td>98845r    </td><td>$56</td><td> </td></tr><tr><td>1234grt</td><td>$34</td><td> </td></tr></tbody></table>

for i in range(1, source_sheet_max_rows, 1):
print(i)
if source_wb[temp_sheet_name].cell(row=i, column=1).value in target_values:
    for j in range(1, target_sheet_max_rows, 1):
        if target_wb[temp_sheet_name].cell(row=j, column=1).value == source_wb[temp_sheet_name].cell(row=i,
                                                                                                           column=1).value:
            target_wb[temp_sheet_name].cell(row=j, column=2).value = source_wb[temp_sheet_name].cell(row=i,
                                                                                                             column=2).value
            target_wb.save(str(temp_sheet_name))

target_values - contains the values from col 1 in target sheet

The above code works, but is very heavy and I think there is some better way do it. The files contain more than 18k rows so it would take ages to compare data. The tricky part is that I need to know in which row in the target file my cell from source file is to fill column with corresponding value. I am using openpyxl but if it is easier I could use pandas.

Thx


回答1:


Question: check if value from source.xlsx exists in a target.xlsx file.

Implement it like the following example:
Documentation: OpenPyXl - accessing-many-cells
Python - Mapping Types — dict, Python - object.__init__

class SourceSheet:
    def __init__(self, ws):
        self.ws = ws

    def __iter__(self):
        """
        Implement iterRows or iterRange
        :return: yield a tuple (value_to_search, value_to_fill)
        """
        # Example iterRange
        for row in range(1, self.ws.max_rows + 1):
            yield (self.ws.cell(row=row, column=1).value, self.ws.cell(row=row, column=2).value)

class TargetSheet:
    def __init__(self, ws):
        self.ws = ws

        """
        Create a 'dict' from all Values in Column A
        This allows Random Access the Cell Value to get the Cell Row Index
        Dict.key == Cell Value
        Dict.value = Cell Row Index
        _columnA = {} # {cell.value:cell.row}
        """
        self._columnA = dict(((c.value, c.row) for c in ws['A']))

    def find(self, value):
        """
        Implement either linear Search using iterRows over one Column or
                         search in dict to find 'value'
        :param value: The value to find
        :return: The Cell, to write the 'value_to_fill'
        """
        # Example using dict
        if value in self._columnA:
            return self.ws.cell(row=self._columnA[value], column=2)


sourceSheet = SourceSheet(ws1)
targetSheet = TargetSheet(ws2)        

for value_to_search, value_to_fill in sourceSheet:
    print("SourceSheet:{}".format((value_to_search, value_to_fill)))
    targetCell = targetSheet.find(value_to_search)

    if targetCell:
        print("Match: Write value '{}' to TargetSheet:'{}'".format(value_to_fill, targetCell))
        targetCell.value = value_to_fill
    else:
        print("Value '{}' not fount in TargetSheet!".format(value_to_search))

Output:

SourceSheet:('cell.A1.value', 'cell.B1.value')
Match: Write value 'cell.B1.value' to TargetSheet:'Cell.B1:'
SourceSheet:('cell.A2.value', 'cell.B2.value')
Match: Write value 'cell.B2.value' to TargetSheet:'Cell.B2:'
SourceSheet:('cell.A3.value', 'cell.B3.value')
Match: Write value 'cell.B3.value' to TargetSheet:'Cell.B3:'

Tested with Python: 3.5




回答2:


From my understanding of your question it seems like the rows in target file are not arranged in the same specific order as the source file.

for i in range(1, souce_sheet_max_rows):
    for j in range(1, target_sheet_max_rows):
        if target_wb[temp_sheet_name].cell(row=j, column=1).value == source_wb[temp_sheet_name].cell(row=i, column=1).value:
            target_wb[temp_sheet_name].cell(row=j, column=2).value == source_wb[temp_sheet_name].cell(row=i, column=2).value
            break
target_wb.save(temp_sheet_name)


来源:https://stackoverflow.com/questions/53725514/openpyxl-compare-cells

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!