【Deep Learning】Pointer Networks

安稳与你 提交于 2020-02-01 03:01:51

Pointer Networks

TLDR; The authors propose a new architecture called “Pointer Network”. A Pointer Network is a seq2seq architecture with attention mechanism where the output vocabulary is the set of input indices. Since the output vocabulary varies based on input sequence length, a Pointer Network can generalize to variable-length inputs. The attention method trough which this is achieved is O(n^2), and only a sight variation of the standard seq2seq attention mechanism. The authors evaluate the architecture on tasks where the outputs correspond to positions of the inputs: Convex Hull, Delaunay Triangulation and Traveling Salesman problems. The architecture performs well these, and generalizes to sequences longer than those found in the training data.

Key Points

  • Similar to standard attention, but don’t blend the encoder states, use the attention vector directory.
  • Softmax probabilities of outputs can be interpreted as a fuzzy pointer.
  • We can solve the same problem artificially using seq2seq and outputting “coordinates”, but that ignores the output constraints and would be less efficient.
  • 512 unit LSTM, SGD with LR 1.0, batch size of 128, L2 gradient clipping of 2.0.
  • In the case of TSP, the “student” networks outperforms the “teacher” algorithm.

Notes/ Questions

  • Seems like this architecture could be applied to generating spans (as in the newer “Text Processing From Bytes” paper), for POS tagging for example. That would require outputting classes in addition to input pointers. How?
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!