问题
I have a custom keras.utils.sequence
which generates batches in a specific (and critical) order.
However, I need to parellelise batch generation across multiple cores. Does the name 'OrderedEnqueuer
' imply that the order of batches in the resulting queue is guaranteed to be the same as the order of the original keras.utils.sequence
?
My reasons for thinking that this order is not guaranteed:
- OrderedEnqueuer uses python
multiprocessing
'sapply_async
internally. - Keras' docs explicitly say that
OrderedEnqueuer
is guaranteed not to duplicate batches - but not that the order is guaranteed.
My reasons for thinking that it is:
- The name!
- I understand that
keras.utils.sequence
objects are indexable. - I found test scripts on Keras' github which appear to be designed to verify order - although I could not find any documentation about whether these were passed, or whether they are truly conclusive.
If the order here is not guaranteed, I would welcome any suggestions on how to parellelise batch preparation while maintaining a guaranteed order, with the proviso that it must be able to parellelise arbitrary python code - I believe e.g tf.data.Dataset
API does not allow this (tf.py_function
calls back to original python process).
回答1:
Yes, it's ordered.
Check it yourself with the following test.
First, let's create a dummy Sequence
that returns just the batch index after waiting a random time (the random time is to assure that the batches will not be finished in order):
import time, random, datetime
import numpy as np
import tensorflow as tf
class DataLoader(tf.keras.utils.Sequence):
def __len__(self):
return 10
def __getitem__(self, i):
time.sleep(random.randint(1,2))
#you could add a print here to see that it's out of order
return i
Now let's create a test function that creates the enqueuer and uses it. The function takes the number of workers and returns the time taken as well as the results as returned.
def test(workers):
enq = tf.keras.utils.OrderedEnqueuer(DataLoader())
enq.start(workers = workers)
gen = enq.get()
results = []
start = datetime.datetime.now()
for i in range(30):
results.append(next(gen))
enq.stop()
print('test with', workers, 'workers took', datetime.datetime.now() - start)
print("results:", results)
Results:
test(1)
test(8)
test with 1 workers took 0:00:45.093122
results: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
test with 8 workers took 0:00:09.127771
results: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Notice that:
- 8 workers is way faster than 1 worker -> it is parallelizing ok
- results are ordered for both cases
来源:https://stackoverflow.com/questions/59213040/is-the-order-of-batches-guaranteed-in-keras-orderedenqueuer