For a network architecture like this:
+---+
input1--->| CNN | -------|
+---+ |
|
+---+ +-------+ +-------+
input2--->| CNN | ----| Concat|-----|----| VGG |---- Main_out
+---+ +-------+ | +-------+
| |
+---+ | |
input3--->| CNN | --------| Aux_out
+---+
How does the backpropagation flow go? I mean, there are two backpropagation steps? Or the only one that comes from the Main_out
updates the weights.
I am using loss weights for each output:
model.compile(loss="categorical_crossentropy",optimizer=OPT,metrics=["accuracy"],
loss_weights={'main_output': 1., 'aux_output': 0.2}
The losses for different outputs are combined into a final loss according to loss_weights
,
final_loss = loss_main + 0.2*loss_aux
the parameters will be updated with respect to this loss by one backpropagation step at each iteration.
(I cannot post a comment as i don't have enough reputation so i'm posting my question as an answer. Sorry for that but i'm struggling to have information on that subject)
As i asked the same question here, i also have trouble understanding how it works; as JulesR, i have better "main ouput" accuracy when adding "aux_out" using another network architecture.
If i understand dontloo's response (please correct me if i'm wrong), there is only one backpropagation despite the multiple outputs but the loss used is weighted according to the outputs. So for JulesR's network, the update of the VGG weights during backpropagation is also influenced by this weighted loss (therefore by the "intermediate output")? If yes, isn't it strange regarding the fact that the VGG network is after this output?
Also, @JulesR has mentioned that auxiliary outputs can helps the vanishing gradient problem. Do you have some links about articles speaking of the effects of auxiliary outputs?
来源:https://stackoverflow.com/questions/57213471/how-does-keras-handle-backpropagation-in-multiple-outputs