问题
I have a custom keras.utils.sequence which generates batches in a specific (and critical) order.
However, I need to parellelise batch generation across multiple cores. Does the name 'OrderedEnqueuer' imply that the order of batches in the resulting queue is guaranteed to be the same as the order of the original keras.utils.sequence?
My reasons for thinking that this order is not guaranteed:
- OrderedEnqueuer uses python
multiprocessing'sapply_asyncinternally. - Keras' docs explicitly say that
OrderedEnqueueris guaranteed not to duplicate batches - but not that the order is guaranteed.
My reasons for thinking that it is:
- The name!
- I understand that
keras.utils.sequenceobjects are indexable. - I found test scripts on Keras' github which appear to be designed to verify order - although I could not find any documentation about whether these were passed, or whether they are truly conclusive.
If the order here is not guaranteed, I would welcome any suggestions on how to parellelise batch preparation while maintaining a guaranteed order, with the proviso that it must be able to parellelise arbitrary python code - I believe e.g tf.data.Dataset API does not allow this (tf.py_function calls back to original python process).
回答1:
Yes, it's ordered.
Check it yourself with the following test.
First, let's create a dummy Sequence that returns just the batch index after waiting a random time (the random time is to assure that the batches will not be finished in order):
import time, random, datetime
import numpy as np
import tensorflow as tf
class DataLoader(tf.keras.utils.Sequence):
def __len__(self):
return 10
def __getitem__(self, i):
time.sleep(random.randint(1,2))
#you could add a print here to see that it's out of order
return i
Now let's create a test function that creates the enqueuer and uses it. The function takes the number of workers and returns the time taken as well as the results as returned.
def test(workers):
enq = tf.keras.utils.OrderedEnqueuer(DataLoader())
enq.start(workers = workers)
gen = enq.get()
results = []
start = datetime.datetime.now()
for i in range(30):
results.append(next(gen))
enq.stop()
print('test with', workers, 'workers took', datetime.datetime.now() - start)
print("results:", results)
Results:
test(1)
test(8)
test with 1 workers took 0:00:45.093122
results: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
test with 8 workers took 0:00:09.127771
results: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Notice that:
- 8 workers is way faster than 1 worker -> it is parallelizing ok
- results are ordered for both cases
来源:https://stackoverflow.com/questions/59213040/is-the-order-of-batches-guaranteed-in-keras-orderedenqueuer