How can I split a file in python?

前端 未结 9 593
失恋的感觉
失恋的感觉 2020-12-03 07:29

Is it possible to split a file? For example you have huge wordlist, I want to split it so that it becomes more than one file. How is this possible?

相关标签:
9条回答
  • 2020-12-03 07:59

    This is a late answer, but a new question was linked here and none of the answers mentioned itertools.groupby.

    Assuming you have a (huge) file file.txt that you want to split in chunks of MAXLINES lines file_part1.txt, ..., file_partn.txt, you could do:

    with open(file.txt) as fdin:
        for i, sub in itertools.groupby(enumerate(fdin), lambda x: 1 + x[0]//3):
            fdout = open("file_part{}.txt".format(i))
            for _, line in sub:
                fdout.write(line)
    
    0 讨论(0)
  • 2020-12-03 08:00

    Sure it's possible:

    open input file
    open output file 1
    count = 0
    for each line in file:
        write to output file
        count = count + 1
        if count > maxlines:
             close output file
             open next output file
             count = 0
    
    0 讨论(0)
  • 2020-12-03 08:00

    You can use use this pypi filesplit module.

    0 讨论(0)
  • 2020-12-03 08:07
    def split_file(file, prefix, max_size, buffer=1024):
        """
        file: the input file
        prefix: prefix of the output files that will be created
        max_size: maximum size of each created file in bytes
        buffer: buffer size in bytes
    
        Returns the number of parts created.
        """
        with open(file, 'r+b') as src:
            suffix = 0
            while True:
                with open(prefix + '.%s' % suffix, 'w+b') as tgt:
                    written = 0
                    while written < max_size:
                        data = src.read(buffer)
                        if data:
                            tgt.write(data)
                            written += buffer
                        else:
                            return suffix
                    suffix += 1
    
    
    def cat_files(infiles, outfile, buffer=1024):
        """
        infiles: a list of files
        outfile: the file that will be created
        buffer: buffer size in bytes
        """
        with open(outfile, 'w+b') as tgt:
            for infile in sorted(infiles):
                with open(infile, 'r+b') as src:
                    while True:
                        data = src.read(buffer)
                        if data:
                            tgt.write(data)
                        else:
                            break
    
    0 讨论(0)
  • 2020-12-03 08:14

    This one splits a file up by newlines and writes it back out. You can change the delimiter easily. This can also handle uneven amounts as well, if you don't have a multiple of splitLen lines (20 in this example) in your input file.

    splitLen = 20         # 20 lines per file
    outputBase = 'output' # output.1.txt, output.2.txt, etc.
    
    # This is shorthand and not friendly with memory
    # on very large files (Sean Cavanagh), but it works.
    input = open('input.txt', 'r').read().split('\n')
    
    at = 1
    for lines in range(0, len(input), splitLen):
        # First, get the list slice
        outputData = input[lines:lines+splitLen]
    
        # Now open the output file, join the new slice with newlines
        # and write it out. Then close the file.
        output = open(outputBase + str(at) + '.txt', 'w')
        output.write('\n'.join(outputData))
        output.close()
    
        # Increment the counter
        at += 1
    
    0 讨论(0)
  • 2020-12-03 08:14

    A better loop for sli's example, not hogging memory :

    splitLen = 20         # 20 lines per file
    outputBase = 'output' # output.1.txt, output.2.txt, etc.
    
    input = open('input.txt', 'r')
    
    count = 0
    at = 0
    dest = None
    for line in input:
        if count % splitLen == 0:
            if dest: dest.close()
            dest = open(outputBase + str(at) + '.txt', 'w')
            at += 1
        dest.write(line)
        count += 1
    
    0 讨论(0)
提交回复
热议问题