When looking at the architecture of YOLO as presented in the original paper of J. Redmon, I do not understand the depth of each filter indicated on their figure. Fair enoug