Understanding output of Dense layer for higher dimension

拜拜、爱过 提交于 2019-12-07 05:07:30

This is tricky but it does fit with the documentation from Keras on dense layers,

Output shape

nD tensor with shape: (batch_size, ..., units). For instance, for a 2D input with shape (batch_size, input_dim), the output would have shape (batch_size, units)

Note it is not the clearest, but they are saying with the ... that the final dimension of the input shape will be elided by the number of dense connections. Basically, for each item of the final dimension, create a connection to each of the requested dense nodes in the coming dense layer.

In your case you have something which is 2 x 3 x 1. So there is "one thing" (the 2 x 3 thing) to be connected to each of the 5 dense layer nodes, hense 2 x 3 x 5. You can think of it like channels of a CNN layer in this particular case. There is a distinct 2 x 3 sheet of outputs for each of the 5 output "nodes".

In a purely 2-D case (batch_size, units) ... then each item iterated by the final dimension units is itself a scalar value, so you end up with something of exactly the size of the number of dense nodes requested.

But in a higher-dimensional case, each item you iterate along the final dimension of the input will itself still be a higher-dimensional thing, and so the output is k distinct "clones" of those higher-dimensional things, where k is the dense layer size requested, and by "clone" we mean the output for a single dense connection has the same shape as the the items in the final dimension of the input.

Then the Dense-ness means that each specific element of that output has a connection to each element of the corresponding set of inputs. But be careful about this. Dense layers are defined by having "one" connection between each item of the output and each item of the input. So even though you have 5 "2x3 things" in your output, they each just have one solitary weight associated with them about how they are connected to the 2x3 thing that is the input. Keras also defaults to using a bias vector (not bias tensor), so if the dense layer has dimension k and the final dimension of the previous layer is n you should expect (n+1)k trainable parameters. These will always be used with numpy-like broadcasting to make the lesser dimensional shape of the weight and bias vectors conformable to the actual shapes of the input tensors.

It is customary to use Flatten as in your first example if you want to enforce the exact size of the coming dense layer. You would use multidimensional Dense layer when you want different "(n - 1)D" groups of connections to each Dense node. This is probably extremely rare for higher dimensional inputs because you'd typically want a CNN type of operation, but I could imagine maybe in some cases where a model predicts pixel-wise values or if you are generating a full nD output, like from the decoder portion of an encoder-decoder network, you might want a dense array of cells that match the dimensions of some expected structured output type like an image or video.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!