Is there a way to use multithreading to write to different columns of the same csv file?

问题

My code use text file as an input and use multithreading to execute the lines together, I need to output CSV file that every thread fill another column in the file. I can't find a way to do that, is there a way?

My code:

def generate_values(min, max, amount):
    arr = [None] * amount
    for i in range(amount):
        arr[i] = random.uniform(min, max)
        if thread.is_alive():
            output_csv.write(str(arr[i]))
    return arr


input_file = open("max_min.txt", "r")
output_csv= open("uniform_values.csv","w")
for lines in input_file:
    line = lines.split("\n")
    for fields in line:
        if "\n" in line:
            line.remove("\n")
        for i in fields:
            i = fields.split(",")
        Emin = float(i[0])
        Emax = float(i[1])
        Eamount = int(i[2])
        thread=threading.Thread(target=generate_values, args=(Emin,Emax,Eamount))
        threads.append(thread)

for thread in threads:
    thread.start()

for thread in threads:
    thread.join()

The input file is:

2,5,1000
1,7,1000
4,25,1000

The output should be the numbers of each thread in different columns. Something like:

3.4 ; 5.6 ; 21.4
4.2 ; 5.8 ; 31.2
.
.
.
etc. (The semicolon defined each column)

If not through multithreading, how to make the data that inserted to the CSV file insert to another column?

回答1:

You can't write to different parts of different lines of a text file from threads because doing so is inherently random, so writing different columns of data concurrently isn't feasible.

Plus, as I said in a comment, I don't think multithreading is going to speed up what you're doing anyway. The only time threads will actually run in parallel is when they do I/O or make calls to Python extensions written on other languages, but otherwise they cooperatively multitask and share time being executed by the interpreter.

Nevertheless, here how to do it with threads. To workaround the simultaneous writing to a file limitation, each thread instead puts the column data along with the column index into a shared Queue. When they're all finished the data that was put into the Queue is written out a row at a time.

The code also uses the csv module to handle the reading and writing of the files, because that's the format both the input and output files have.

import csv
from queue import Queue
from random import uniform
from threading import Thread


def generate_values(index, min, max, amount):
    col = tuple(uniform(min, max) for _ in range(amount))
    cols.put((index, col))


with open("max_min.txt", "r", newline='') as input_file:

    # Generate the columns of values.
    cols = Queue()  # FIFO
    kinds = float, float, int  # Data types of input fields.
    threads = []

    for i, fields in enumerate(csv.reader(input_file)):
        # Convert fields to proper type and assign them to named variables.
        e_min, e_max, e_amount = (kind(field) for kind, field in zip(kinds, fields))
#        e_amount = 5  # Limit number of rows for testing.
        thread = Thread(target=generate_values, args=(i, e_min, e_max, e_amount))
        threads.append(thread)
        thread.start()

    # Wait for all the threads to finish.
    for thread in threads:
        thread.join()

    # Store column data into a dictionary keyed by column index number.
    results = {index: col for index, col in sorted(cols.queue)}

# Write the results to a csv file.
with open("uniform_values.csv", "w", newline='') as output_csv:
    writer = csv.writer(output_csv, delimiter=';')
    writer.writerows(row for row in zip(*results.values()))

print('done')

来源：https://stackoverflow.com/questions/59617170/is-there-a-way-to-use-multithreading-to-write-to-different-columns-of-the-same-c

标签

python

multithreading

csv