问题
Checkpoint snippet:
checkpointer = ModelCheckpoint(filepath=os.path.join(savedir, "mid/weights.{epoch:02d}.hd5"), monitor='val_loss', verbose=1, save_best_only=False, save_weights_only=False)
hist = model.fit_generator(
gen.generate(batch_size = batch_size, nb_classes=nb_classes), samples_per_epoch=593920, nb_epoch=nb_epoch, verbose=1, callbacks=[checkpointer], validation_data = gen.vld_generate(VLD_PATH, batch_size = 64, nb_classes=nb_classes), nb_val_samples=10000
)
I trained my model on a multiple GPU host which dumps mid
files in HDF5 format. When I loaded them on a single GPU machine with keras.load_weights('mid')
, an error was raised:
Using TensorFlow backend.
Traceback (most recent call last):
File "server.py", line 171, in <module>
model = load_model_and_weights('zhch.yml', '7_weights.52.hd5')
File "server.py", line 16, in load_model_and_weights
model.load_weights(os.path.join('model', weights_name))
File "/home/lz/code/ProjectGo/meta/project/libpolicy-server/.virtualenv/lib/python3.5/site-packages/keras/engine/topology.py", line 2701, in load_weights
self.load_weights_from_hdf5_group(f)
File "/home/lz/code/ProjectGo/meta/project/libpolicy-server/.virtualenv/lib/python3.5/site-packages/keras/engine/topology.py", line 2753, in load_weights_from_hdf5_group
str(len(flattened_layers)) + ' layers.')
ValueError: You are trying to load a weight file containing 1 layers into a model with 21 layers.
Is there any way to load checkpoint weights generated by multiple GPUs on a single GPU machine? It seems that no issue of Keras discussed this problem thus any help would be appreciated.
回答1:
You can load your model on a single GPU like this:
from keras.models import load_model
multi_gpus_model = load_model('mid')
origin_model = multi_gpus_model.layers[-2] # you can use multi_gpus_model.summary() to see the layer of the original model
origin_model.save_weights('single_gpu_model.hdf5')
'single_gpu_model.hdf5' is the file that you can load to the single GPU machine model.
回答2:
Try this function:
def keras_model_reassign_weights(model_cpu,model_gpu):
weights_temp ={}
print('_'*5,'Collecting weights from GPU model','_'*5)
for layer in model_gpu.layers:
try:
for layer_unw in layer.layers:
#print('Weights extracted for: ',layer_unw.name)
weights_temp[layer_unw.name] = layer_unw.get_weights()
break
except:
print('Skipped: ',layer.name)
print('_'*5,'Writing weights to CPU model','_'*5)
for layer in model_cpu.layers:
try:
layer.set_weights(weights_temp[layer.name])
#print(layer.name,'Done!')
except:
print(layer.name,'weights does not set for this layer!')
return model_cpu
But you need to load weights to your gpu model first:
#load or initialize your keras multi-gpu model
model_gpu = None
#load or initialize your keras model with the same structure, without using keras.multi_gpu function
model_cpu = None
#load weights into multigpu model
model_gpu.load_weights(r'gpu_model_best_checkpoint.hdf5')
#execute function
model_cpu = keras_model_reassign_weights(model_cpu,model_gpu)
#save obtained weights for cpu model
model_cpu.save_weights(r'CPU_model.hdf5')
After transferring you can use weights with a single GPU or CPU model.
来源:https://stackoverflow.com/questions/41342098/keras-load-checkpoint-weights-hdf5-generated-by-multiple-gpus