What is the difference between an Embedding Layer and a Dense Layer?

前端 未结 2 940
广开言路
广开言路 2020-12-05 07:09

The docs for an Embedding Layer in Keras say:

Turns positive integers (indexes) into dense vectors of fixed size. eg. [[4], [20]] ->

2条回答
  •  自闭症患者
    2020-12-05 07:37

    An embedding layer is faster, because it is essentially the equivalent of a dense layer that makes simplifying assumptions.

    Imagine a word-to-embedding layer with these weights:

    w = [[0.1, 0.2, 0.3, 0.4],
         [0.5, 0.6, 0.7, 0.8],
         [0.9, 0.0, 0.1, 0.2]]
    

    A Dense layer will treat these like actual weights with which to perform matrix multiplication. An embedding layer will simply treat these weights as a list of vectors, each vector representing one word; the 0th word in the vocabulary is w[0], 1st is w[1], etc.


    For an example, use the weights above and this sentence:

    [0, 2, 1, 2]
    

    A naive Dense-based net needs to convert that sentence to a 1-hot encoding

    [[1, 0, 0],
     [0, 0, 1],
     [0, 1, 0],
     [0, 0, 1]]
    

    then do a matrix multiplication

    [[1 * 0.1 + 0 * 0.5 + 0 * 0.9, 1 * 0.2 + 0 * 0.6 + 0 * 0.0, 1 * 0.3 + 0 * 0.7 + 0 * 0.1, 1 * 0.4 + 0 * 0.8 + 0 * 0.2],
     [0 * 0.1 + 0 * 0.5 + 1 * 0.9, 0 * 0.2 + 0 * 0.6 + 1 * 0.0, 0 * 0.3 + 0 * 0.7 + 1 * 0.1, 0 * 0.4 + 0 * 0.8 + 1 * 0.2],
     [0 * 0.1 + 1 * 0.5 + 0 * 0.9, 0 * 0.2 + 1 * 0.6 + 0 * 0.0, 0 * 0.3 + 1 * 0.7 + 0 * 0.1, 0 * 0.4 + 1 * 0.8 + 0 * 0.2],
     [0 * 0.1 + 0 * 0.5 + 1 * 0.9, 0 * 0.2 + 0 * 0.6 + 1 * 0.0, 0 * 0.3 + 0 * 0.7 + 1 * 0.1, 0 * 0.4 + 0 * 0.8 + 1 * 0.2]]
    

    =

    [[0.1, 0.2, 0.3, 0.4],
     [0.9, 0.0, 0.1, 0.2],
     [0.5, 0.6, 0.7, 0.8],
     [0.9, 0.0, 0.1, 0.2]]
    

    However, an Embedding layer simply looks at [0, 2, 1, 2] and takes the weights of the layer at indices zero, two, one, and two to immediately get

    [w[0],
     w[2],
     w[1],
     w[2]]
    

    =

    [[0.1, 0.2, 0.3, 0.4],
     [0.9, 0.0, 0.1, 0.2],
     [0.5, 0.6, 0.7, 0.8],
     [0.9, 0.0, 0.1, 0.2]]
    

    So it's the same result, just obtained in a hopefully faster way.


    The Embedding layer does have limitations:

    • The input needs to be integers in [0, vocab_length).
    • No bias.
    • No activation.

    However, none of those limitations should matter if you just want to convert an integer-encoded word into an embedding.

提交回复
热议问题