Periodically call deferToThread

问题

I have a list of strings and I want to process the strings in a periodic manner.

The period of starting processing a new string is 1 second, and it takes 3 seconds to process a string.

What I expect to observe is that from the 3rd second on, I will see a new result every second until all the strings are processed.

However, what I actually saw was that all the results showed up together when all of them are generated. So the question is, how to modify the code to achieve what I expect to see?

from twisted.internet import reactor, threads
import json
import time


def process(string):
    print "Processing " + string + "\n"
    time.sleep(3)  # simulate computation time

    # write result to file; result is mocked by string*3
    file_name = string + ".txt"
    with open(file_name, "w") as fp:
        json.dump(string*3, fp)

    print string + " processed\n"

string_list = ["AAAA", "BBBB", "CCCC", "XXXX", "YYYY", "ZZZZ"]

for s in string_list:
    # start a new thread every second
    time.sleep(1)
    threads.deferToThread(process, s)

reactor.run()

Meanwhile, it looks like that the order in which the results are generated isn't consistent with the order in which the strings are processed. I would guess it's just printed out of order but they actually are processed in order. How to verify my guess?

Another trivial thing I noticed is that Processing YYYY is not printed in the right place. Why is that? (There should be an empty line between it and the previous result.)

Processing AAAA

Processing BBBB

Processing CCCC

Processing XXXX
Processing YYYY


Processing ZZZZ

YYYY processed

CCCC processed

AAAA processed

BBBB processed

XXXX processed

ZZZZ processed

回答1:

What this part of your code does:

for s in string_list:
    # start a new thread every second
    time.sleep(1)
    threads.deferToThread(process, s)

reactor.run()

is schedule each chunk of work with a delay of one second between each scheduling operation. Then, finally, it starts the reactor which allows processing to begin. There is no processing until reactor.run().

The use of time.sleep(1) also means your delays are blocking and this will be a problem once you solve the above.

One solution is to replace the for loop and the time.sleep(1) with a LoopingCall.

from twisted.internet.task import LoopingCall, react

string_list = [...]
def process(string):
    ...

def process_strings(the_strings, f):
    def dispatch(s):
        d = deferToThread(f, s)
        # Add callback / errback to d here to process the
        # result or report any problems.
        # Do _not_ return `d` though.  LoopingCall will
        # wait on it before running the next iteration if
        # we do.

    string_iter = iter(the_strings)
    c = LoopingCall(lambda: dispatch(next(string_iter)))
    d = c.start(1)
    d.addErrback(lambda err: err.trap(StopIteration))
    return d

def main(reactor):
    return process_strings(string_list, process)

react(main, [])

This code uses react to start and stop the reactor (it stops when the Deferred returned by main fires). It uses LoopingCall started with a period of 1 to run f(next(string_iter)) in the threadpool until StopIteration (or some other error) is encountered.

(LoopingCall and deferToThread both take *args and **kwargs to pass on to their callable so if you prefer (it's a matter of style), you can also write that expression as LoopingCall(lambda: deferToThread(f, next(string_iter))). You cannot "unwrap" the remaining lambda because that would result in LoopingCall(deferToThread, f, next(string_iter)) which only evaluates next(string_iter) once at the time LoopingCall is called so you would end up processing the first string forever.)

There are other possible approaches to scheduling as well. For example, you could use cooperate to run exactly 3 processing threads at a time - starting a new one as soon as an older one completes.

from twisted.internet.defer import gatherResults
from twisted.internet.task import cooperate

def process_strings(the_strings, f):
    # Define a generator of all of the jobs to be accomplished.
    work_iter = (
        deferToThread(lambda: f(a_string))
        for a_string
        in the_strings
    )
    # Consume jobs from the generator in parallel until done.
    tasks = list(cooperate(work_iter) for i in range(3))

    # Return a Deferred that fires when all three tasks have
    # finished consuming all available jobs.
    return gatherResults(list(
        t.whenDone()
        for t
        in tasks
    ))

In both cases, notice there's no use of time.sleep.

来源：https://stackoverflow.com/questions/44869485/periodically-call-defertothread

标签

python

twisted