How two merge several .csv files horizontally with python?

后端 未结 6 591
渐次进展
渐次进展 2021-01-14 16:11

I\'ve several .csv files (~10) and need to merge them together into a single file horizontally. Each file has the same number of rows (~300) and 4 header lines which are not

相关标签:
6条回答
  • 2021-01-14 16:22

    If you don't necessarily have to use Python, you can use shell tools like paste/gawk etc

    $ paste file1 file2 file3 file4 .. | awk 'NR>4'
    

    The above will put them horizontally without the headers. If you want the headers, just get them from file1

    $  ( head -4 file ; paste file[1-4] | awk 'NR>4' ) > output
    
    0 讨论(0)
  • 2021-01-14 16:27

    You learn by doing (and trying, even). So, I'll just give you a few hints. Use the following functions:

    • To open a file: open()
    • To read all the lines in a file: IOBase.readlines()
    • To split a string according to a series of splitting tokents: str.split()

    If you really don't know what to do, I recommend you read the tutorial and Dive Into Python 3. (Depending on how much Python you know, you'll either have to read through the first few chapters or cut straight to the file IO chapters.)

    0 讨论(0)
  • 2021-01-14 16:35

    Purely for learning purposes

    A simple approach that does not take advantage of csv module:

    # open file to write
    file_to_write = open(filename, 'w')
    # your list of csv files
    csv_files = [file1, file2, ...] 
    
    headers = True
    # iterate through your list
    for filex in csv_files:
        # mark the lines that are header lines
        header_count = 0
        # open the csv file and read line by line
        filex_f = open(filex, 'r')
        for line in filex_f:
            # write header only once
            if headers:
                file_to_write.write(line+"\n")
                if header_count > 3: headers = False
            # Write all other lines to the file
            if header_count > 3:
                file_to_write.write(line+"\n")
            # count lines
            header_count = header_count + 1
        # close file
        filex_f.close()
    file_to_write.close()
    
    0 讨论(0)
  • 2021-01-14 16:36

    The csv module is your friend.

    0 讨论(0)
  • 2021-01-14 16:42

    You dont need to use csv module for this. You can just use

    file1 = open(file1)
    

    After opening all your files you can do this

    from itertools import izip_longest
    
    foo=[]
    for new_line in izip_longest(file1,fil2,file3....,fillvalue=''):
        foo.append(new_line)
    

    This will give you this structure (which kon has already told you)..It will also work if you have different number of lines in each file

    [ ("line10", "line20", "line30", "line40"),
      ("line11", "line21", "line31", "line41"),
      ... 
    ]
    

    After this you can just write it to a new file taking 1 list at a time

    for listx in foo:
        new_file.write(','.join(j for j in listx))
    

    PS: more about izip_longest here

    0 讨论(0)
  • 2021-01-14 16:44

    You can load the CSV files using the csv module in Python. Please refer to the documentation of this module for the loading code, I cannot remember it but it is really easy. Something like:

    import csv
    reader = csv.reader(open("some.csv", "rb"))
    csvContent = list(reader)
    

    After that, when you have the CSV files loaded in such form (a list of tuples):

    [ ("header1", "header2", "header3", "header4"),
      ("value01", "value12", "value13", "value14"),
      ("value11", "value12", "value13", "value14"),
      ... 
    ]
    

    You can merge two such lists line-by-line:

    result = [a+b for (a,b) in zip(csvList1, csvList2)]
    

    To save such a result, you can use:

    writer = csv.writer(open("some.csv", "wb"))
    writer.writerows(result)
    
    0 讨论(0)
提交回复
热议问题