Training a CNN with pre-trained word embeddings is very slow (TensorFlow)

梦想与她 提交于 2019-12-13 01:51:05

问题


I'm using TensorFlow (0.6) to train a CNN on text data. I'm using a method similar to the second option specified in this SO thread (with the exception that the embeddings are trainable). My dataset is pretty small and the vocabulary is around 12,000 words. When I train using random word embeddings everything works nicely. However, when I switch to the pre-trained embeddings from the word2vec site, the vocabulary grows to over 3,000,000 words and training iterations become over 100 times slower. I'm also seeing this warning:

UserWarning: Converting sparse IndexedSlices to a dense Tensor with 900482700 elements

I saw the discussion on this TensorFlow issue, but I'm still not sure if the slowdown I'm experiencing is expected or if it's a bug. I'm using the Adam optimizer but it's pretty much the same thing with Adagrad.

One workaround I guess I could try is to train using a minimal embedding matrix with only the ~12,000 words in my dataset, serialize the resulting embeddings and at runtime merge them with the remaining words from the pre-trained embeddings. I think this should work but it sounds hacky.

Is that currently the best solution or am I missing something?


回答1:


So there were two issues here:

  1. As mrry pointed out in his comment to the question, the warning was not a result of a conversion during the updates. Rather, I was calculating summary statistics (sparsity and histogram) on the embeddings gradient and that caused the conversion.
  2. Interestingly, removing the summaries made the message go away, but the code remained slow. Per the TensorFlow issue referenced in the question, I had to also replace the AdamOptimizer with the AdagradOptimizer and once I did that the runtime was back on par with the one obtained from a small vocabulary.


来源:https://stackoverflow.com/questions/35828037/training-a-cnn-with-pre-trained-word-embeddings-is-very-slow-tensorflow

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!