Extract data from tsv file python

心已入冬 提交于 2019-12-04 06:29:00

问题


I have a TSV file, that looks like this:

A   B   C   D   D=1;E=2
S   D   F   G   H=2;B=4

I'd like to write the contents to another tsv file in this way.

A   B   C   D   D   1
A   B   C   D   E   2
S   D   F   G   H   2
S   D   F   G   B   4

I'd really appreciate if anyone could help/ hint me in splitting column 5 as desired.


回答1:


If you are positively sure you only have tabs and semicolons, then you can use split.

with open('/tmp/test.tsv') as infile, open('/tmp/test2.tsv', 'w') as outfile:
    for line in infile:
        tsplit = line.split("\t")
        firstcolumns = tsplit[:-1]
        lastitems = tsplit[-1].strip().split(";")
        for item in lastitems:
            allcolumns = firstcolumns + item.split("=")
            outfile.write("\t".join(allcolumns) + "\n")

(Updated to make it easier to compare with the other answer.)

This will work regardless of the number of semicolon-separated items you have in the last column. However, this is sensitive to small changes in the format (e.g. added spaces).




回答2:


with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
    writer = csv.writer(outfile, delimiter='\t')
    for line in csv.reader(infile, delimiter='\t'):
        vals = line[-1]
        headers = line[:-1]
        for val in vals.split(';'):
            writer.writeline(headers + [val])


来源:https://stackoverflow.com/questions/25516332/extract-data-from-tsv-file-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!