Python append multiple files in given order to one big file

前端 未结 10 805
时光说笑
时光说笑 2020-11-29 06:00

I have up to 8 seperate Python processes creating temp files in a shared folder. Then I\'d like the controlling process to append all the temp files in a certain order into

相关标签:
10条回答
  • 2020-11-29 06:32

    In this code, you can indicate the path and name of the input/output files, and it will create the final big file in that path:

    import os
    
    dir_name = "Your_Desired_Folder/Goes_Here"    #path
    input_files_names = ["File1.txt", "File2.txt", "File3.txt"]     #input files
    file_name_out = "Big_File.txt"     #choose a name for the output file
    file_output = os.path.join(dir_name, file_name_out)
    fout = open(file_output, "w")
    
    for tempfile in input_files_names:
        inputfile = os.path.join(dir_name, tempfile)
        fin = open(inputfile, 'r')
        for line in fin:
            fout.write(line)
    
    fin.close()    
    fout.close()
    
    0 讨论(0)
  • 2020-11-29 06:32

    Simple & Efficient way to copy data from multiple files to one big file, Before that you need to rename your files to (int) eg. 1,2,3,4...etc, Code:

    #Rename Files First
    
    import os
    
    path = 'directory_name'
    files = os.listdir(path)
    i = 1
    for file in files:
        os.rename(os.path.join(path, file), os.path.join(path, str(i)+'.txt'))
    
        i = i+1

    # Code For Copying Data from Multiple files
    
    import os
    
    i = 1
    while i<50:
    
        filename = i
        for filename in os.listdir("directory_name"):
    
            # %s is your filename # .txt is file extension 
            f = open("%s.txt" % i,'r') 
            fout = open("output_filename", "a")
    
        for line in f:
            fout.write(line)
        i += 1

    0 讨论(0)
  • 2020-11-29 06:33

    I feel a bit stupid to add another answer after 8 years and so many answers, but I arrived here by the "append to file" title, and didn't see the right solution for appending to an existing binary file with buffered read/write.

    So here is the basic way to do that:

    def append_file_to_file(_from, _to):
        block_size = 1024*1024
        with open(_to, "ab") as outfile, open(_from, "rb") as infile:
            while True:
                input_block = infile.read(block_size)
                if not input_block:
                    break
                outfile.write(input_block)
    

    Given this building block, you can use:

    for filename in ['a.bin','b.bin','c.bin']:
        append_file_to_file(filename, 'outfile.bin')
    
    0 讨论(0)
  • 2020-11-29 06:34

    Use fileinput:

    with open("bigfile.txt", "w") as big_file:
        with fileinput.input(files=tempfiles) as inputs:
            for line in inputs:
                big_file.write(line)
    

    This is more memory efficient than @RafeKettler's answer as it doesn't need to read the whole file into memory before writing to big_file.

    0 讨论(0)
  • 2020-11-29 06:35

    Try this. It's very fast (much faster than line-by-line, and shouldn't cause a VM thrash for large files), and should run on about anything, including CPython 2.x, CPython 3.x, Pypy, Pypy3 and Jython. Also it should be highly OS-agnostic. Also, it makes no assumptions about file encodings.

    #!/usr/local/cpython-3.4/bin/python3
    
    '''Cat 3 files to one: example code'''
    
    import os
    
    def main():
        '''Main function'''
        input_filenames = ['a', 'b', 'c']
    
        block_size = 1024 * 1024
    
        if hasattr(os, 'O_BINARY'):
            o_binary = getattr(os, 'O_BINARY')
        else:
            o_binary = 0
        output_file = os.open('output-file', os.O_WRONLY | o_binary)
        for input_filename in input_filenames:
            input_file = os.open(input_filename, os.O_RDONLY | o_binary)
            while True:
                input_block = os.read(input_file, block_size)
                if not input_block:
                    break
                os.write(output_file, input_block)
            os.close(input_file)
        os.close(output_file)
    
    main()
    

    There is one (nontrivial) optimization I've left out: It's better to not assume anything about a good blocksize, instead using a bunch of random ones, and slowly backing off the randomization to focus on the good ones (sometimes called "simulated annealing"). But that's a lot more complexity for little actual performance benefit.

    You could also make the os.write keep track of its return value and restart partial writes, but that's only really necessary if you're expecting to receive (nonterminal) *ix signals.

    0 讨论(0)
  • 2020-11-29 06:37

    There's also the fileinput class in Python 3, which is perfect for this sort of situation

    0 讨论(0)
提交回复
热议问题