How can I split a text file into multiple text files using python?

前端 未结 5 574
佛祖请我去吃肉
佛祖请我去吃肉 2020-12-22 10:22

I have a text file that contains the following contents. I want to split this file into multiple files (1.txt, 2.txt, 3.txt...). Each a new output file will be as the follow

5条回答
  •  难免孤独
    2020-12-22 11:01

    try re.findall() function:

    import re
    
    with open('input.txt', 'r') as f:
        data = f.read()
    
    found = re.findall(r'\n*(A.*?\n\$\$)\n*', data, re.M | re.S)
    
    [open(str(i)+'.txt', 'w').write(found[i-1]) for i in range(1, len(found)+1)]
    

    Minimalistic approach for the first 3 occurrences:

    import re
    
    found = re.findall(r'\n*(A.*?\n\$\$)\n*', open('input.txt', 'r').read(), re.M | re.S)
    
    [open(str(found.index(f)+1)+'.txt', 'w').write(f) for f in found[:3]]
    

    Some explanations:

    found = re.findall(r'\n*(A.*?\n\$\$)\n*', data, re.M | re.S)
    

    will find all occurrences matching the specified RegEx and will put them into the list, called found

    [open(str(found.index(f)+1)+'.txt', 'w').write(f) for f in found]
    

    iterate (using list comprehensions) through all elements belonging to found list and for each element create text file (which is called like "index of the element + 1.txt") and write that element (occurrence) to that file.

    Another version, without RegEx's:

    blocks_to_read = 3
    blk_begin = 'A'
    blk_end = '$$'
    
    with open('35916503.txt', 'r') as f:
        fn = 1
        data = []
        write_block = False
        for line in f:
            if fn > blocks_to_read:
                break 
            line = line.strip()
            if line == blk_begin:
                write_block = True
            if write_block:
                data.append(line)
            if line == blk_end:
                write_block = False
                with open(str(fn) + '.txt', 'w') as fout:
                    fout.write('\n'.join(data))
                    data = []
                fn += 1
    

    PS i, personally, don't like this version and i would use the one using RegEx

提交回复
热议问题