How can I merge CSV rows that have the same value in the first cell?

别说谁变了你拦得住时间么 提交于 2019-12-13 09:46:27

问题


This is the file: https://drive.google.com/file/d/0B5v-nJeoVouHc25wTGdqaDV1WW8/view?usp=sharing

As you can see, there are duplicates in the first column, but if I were to combine the duplicate rows, no data would get overridden in the other columns. Is there any way I can combine the rows with duplicate values in the first column?

For example, turn "1,A,A,," and "1,,,T,T" into "1,A,A,T,T".


回答1:


Plain Python:

import csv

reader = csv.Reader(open('combined.csv'))
result = {}

for row in reader:
    idx = row[0]
    values = row[1:]
    if idx in result:
        result[idx] = [result[idx][i] or v for i, v in enumerate(values)]
    else:
        result[idx] = values

How this magic works:

  • iterate over rows in the CSV file
  • for every record, we check if there was a record with the same index before
  • if this is the first time we see this index, just copy the row values
  • if this is a duplicate, assign row values to empty cells.

The last step is done via or trick: None or value will return value. value or anything will return value. So, result[idx][i] or v will return existing value if it is not empty, or row value.

To output this without loosing the duplicated rows, we need to keep index, then iterate and output corresponding result entries:

indices = []
for row in reader:
    # ...
    indices.append(idx)

writer = csv.writer(open('outfile.csv', 'w'))
for idx in indices:
    writer.writerow([idx] + result[idx])


来源:https://stackoverflow.com/questions/41915138/how-can-i-merge-csv-rows-that-have-the-same-value-in-the-first-cell

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!