问题
I have a list of strings and I want to process the strings in a periodic manner.
The period of starting processing a new string is 1 second, and it takes 3 seconds to process a string.
What I expect to observe is that from the 3rd second on, I will see a new result every second until all the strings are processed.
However, what I actually saw was that all the results showed up together when all of them are generated. So the question is, how to modify the code to achieve what I expect to see?
from twisted.internet import reactor, threads
import json
import time
def process(string):
print "Processing " + string + "\n"
time.sleep(3) # simulate computation time
# write result to file; result is mocked by string*3
file_name = string + ".txt"
with open(file_name, "w") as fp:
json.dump(string*3, fp)
print string + " processed\n"
string_list = ["AAAA", "BBBB", "CCCC", "XXXX", "YYYY", "ZZZZ"]
for s in string_list:
# start a new thread every second
time.sleep(1)
threads.deferToThread(process, s)
reactor.run()
Meanwhile, it looks like that the order in which the results are generated isn't consistent with the order in which the strings are processed. I would guess it's just printed out of order but they actually are processed in order. How to verify my guess?
Another trivial thing I noticed is that Processing YYYY
is not printed in the right place. Why is that? (There should be an empty line between it and the previous result.)
Processing AAAA
Processing BBBB
Processing CCCC
Processing XXXX
Processing YYYY
Processing ZZZZ
YYYY processed
CCCC processed
AAAA processed
BBBB processed
XXXX processed
ZZZZ processed
回答1:
What this part of your code does:
for s in string_list:
# start a new thread every second
time.sleep(1)
threads.deferToThread(process, s)
reactor.run()
is schedule each chunk of work with a delay of one second between each scheduling operation. Then, finally, it starts the reactor which allows processing to begin. There is no processing until reactor.run()
.
The use of time.sleep(1)
also means your delays are blocking and this will be a problem once you solve the above.
One solution is to replace the for
loop and the time.sleep(1)
with a LoopingCall
.
from twisted.internet.task import LoopingCall, react
string_list = [...]
def process(string):
...
def process_strings(the_strings, f):
def dispatch(s):
d = deferToThread(f, s)
# Add callback / errback to d here to process the
# result or report any problems.
# Do _not_ return `d` though. LoopingCall will
# wait on it before running the next iteration if
# we do.
string_iter = iter(the_strings)
c = LoopingCall(lambda: dispatch(next(string_iter)))
d = c.start(1)
d.addErrback(lambda err: err.trap(StopIteration))
return d
def main(reactor):
return process_strings(string_list, process)
react(main, [])
This code uses react
to start and stop the reactor (it stops when the Deferred
returned by main
fires). It uses LoopingCall
started with a period of 1 to run f(next(string_iter))
in the threadpool until StopIteration
(or some other error) is encountered.
(LoopingCall
and deferToThread
both take *args
and **kwargs
to pass on to their callable so if you prefer (it's a matter of style), you can also write that expression as LoopingCall(lambda: deferToThread(f, next(string_iter)))
. You cannot "unwrap" the remaining lambda because that would result in LoopingCall(deferToThread, f, next(string_iter))
which only evaluates next(string_iter)
once at the time LoopingCall
is called so you would end up processing the first string forever.)
There are other possible approaches to scheduling as well. For example, you could use cooperate
to run exactly 3 processing threads at a time - starting a new one as soon as an older one completes.
from twisted.internet.defer import gatherResults
from twisted.internet.task import cooperate
def process_strings(the_strings, f):
# Define a generator of all of the jobs to be accomplished.
work_iter = (
deferToThread(lambda: f(a_string))
for a_string
in the_strings
)
# Consume jobs from the generator in parallel until done.
tasks = list(cooperate(work_iter) for i in range(3))
# Return a Deferred that fires when all three tasks have
# finished consuming all available jobs.
return gatherResults(list(
t.whenDone()
for t
in tasks
))
In both cases, notice there's no use of time.sleep
.
来源:https://stackoverflow.com/questions/44869485/periodically-call-defertothread