Combine a folder of text files into a CSV with each content in a cell

前端未结

关注

 3  1029

感动是毒 2020-12-20 01:22

I have a folder containing several thousand .txt files. I\'d like to combine them in a big .csv according to the following model:

I found a R script suppose

3条回答

情深已故 (楼主)

2020-12-20 01:44

Can be written slightly more compactly using pathlib.

>>> import os
>>> os.chdir('c:/scratch/folder to process')
>>> from pathlib import Path
>>> with open('big.csv', 'w') as out_file:
...     csv_out = csv.writer(out_file)
...     csv_out.writerow(['FileName', 'Content'])
...     for fileName in Path('.').glob('*.txt'):
...         csv_out.writerow([str(fileName),open(str(fileName.absolute())).read().strip()])

The items yielded by this glob provide access to both the full pathname and the filename, hence no need for concatenations.

EDIT: I've examined one of the text files and found that one of the characters that chokes processing looks like 'fi' but is actually these two characters together as a single character. Given the likely practical use to which this csv will be put I suggest the following processing, which ignores weird characters like that one. I strip out endlines because I suspect this makes csv processing more complicated, and a possible topic for another question.

import csv
from pathlib import Path

with open('big.csv', 'w', encoding='Latin-1') as out_file:
    csv_out = csv.writer(out_file)
    csv_out.writerow(['FileName', 'Content'])
    for fileName in Path('.').glob('*.txt'):
        lines = [ ]
        with open(str(fileName.absolute()),'rb') as one_text:
            for line in one_text.readlines():
                lines.append(line.decode(encoding='Latin-1',errors='ignore').strip())
        csv_out.writerow([str(fileName),' '.join(lines)])

0 讨论(0)

查看其它3个回答