What is the difference between an Embedding Layer and a Dense Layer?

前端未结

关注

 2  948

广开言路

The docs for an Embedding Layer in Keras say:

Turns positive integers (indexes) into dense vectors of fixed size. eg. [[4], [20]] ->

相关标签:

2条回答

爱一瞬间的悲伤

2020-12-05 07:30
Mathematically, the difference is this:
- An embedding layer performs select operation. In keras, this layer is equivalent to:
```
K.gather(self.embeddings, inputs)      # just one matrix
```
- A dense layer performs dot-product operation, plus an optional activation:
```
outputs = matmul(inputs, self.kernel)  # a kernel matrix
outputs = bias_add(outputs, self.bias) # a bias vector
return self.activation(outputs)        # an activation function
```
You can emulate an embedding layer with fully-connected layer via one-hot encoding, but the whole point of dense embedding is to avoid one-hot representation. In NLP, the word vocabulary size can be of the order 100k (sometimes even a million). On top of that, it's often needed to process the sequences of words in a batch. Processing the batch of sequences of word indices would be much more efficient than the batch of sequences of one-hot vectors. In addition, gather operation itself is faster than matrix dot-product, both in forward and backward pass.
0 讨论(0)
发布评论:

提交评论
- 加载中...
自闭症患者

2020-12-05 07:37
An embedding layer is faster, because it is essentially the equivalent of a dense layer that makes simplifying assumptions.

Imagine a word-to-embedding layer with these weights:
```
w = [[0.1, 0.2, 0.3, 0.4],
     [0.5, 0.6, 0.7, 0.8],
     [0.9, 0.0, 0.1, 0.2]]
```
A Dense layer will treat these like actual weights with which to perform matrix multiplication. An embedding layer will simply treat these weights as a list of vectors, each vector representing one word; the 0th word in the vocabulary is w[0], 1st is w[1], etc.

For an example, use the weights above and this sentence:
```
[0, 2, 1, 2]
```
A naive Dense-based net needs to convert that sentence to a 1-hot encoding
```
[[1, 0, 0],
 [0, 0, 1],
 [0, 1, 0],
 [0, 0, 1]]
```
then do a matrix multiplication
```
[[1 * 0.1 + 0 * 0.5 + 0 * 0.9, 1 * 0.2 + 0 * 0.6 + 0 * 0.0, 1 * 0.3 + 0 * 0.7 + 0 * 0.1, 1 * 0.4 + 0 * 0.8 + 0 * 0.2],
 [0 * 0.1 + 0 * 0.5 + 1 * 0.9, 0 * 0.2 + 0 * 0.6 + 1 * 0.0, 0 * 0.3 + 0 * 0.7 + 1 * 0.1, 0 * 0.4 + 0 * 0.8 + 1 * 0.2],
 [0 * 0.1 + 1 * 0.5 + 0 * 0.9, 0 * 0.2 + 1 * 0.6 + 0 * 0.0, 0 * 0.3 + 1 * 0.7 + 0 * 0.1, 0 * 0.4 + 1 * 0.8 + 0 * 0.2],
 [0 * 0.1 + 0 * 0.5 + 1 * 0.9, 0 * 0.2 + 0 * 0.6 + 1 * 0.0, 0 * 0.3 + 0 * 0.7 + 1 * 0.1, 0 * 0.4 + 0 * 0.8 + 1 * 0.2]]
```
=
```
[[0.1, 0.2, 0.3, 0.4],
 [0.9, 0.0, 0.1, 0.2],
 [0.5, 0.6, 0.7, 0.8],
 [0.9, 0.0, 0.1, 0.2]]
```
However, an Embedding layer simply looks at [0, 2, 1, 2] and takes the weights of the layer at indices zero, two, one, and two to immediately get
```
[w[0],
 w[2],
 w[1],
 w[2]]
```
=
```
[[0.1, 0.2, 0.3, 0.4],
 [0.9, 0.0, 0.1, 0.2],
 [0.5, 0.6, 0.7, 0.8],
 [0.9, 0.0, 0.1, 0.2]]
```
So it's the same result, just obtained in a hopefully faster way.

The Embedding layer does have limitations:
- The input needs to be integers in [0, vocab_length).
- No bias.
- No activation.
However, none of those limitations should matter if you just want to convert an integer-encoded word into an embedding.
0 讨论(0)
发布评论:

提交评论
- 加载中...