NLP Recurrent Neural Network always gives constant values

问题

I've written a simple recurrent network in TensorFlow based on this video that I watched: https://youtu.be/vq2nnJ4g6N0?t=8546

In the video the RNN is demonstrated to produce Shakespeare plays by having the network produce words one character at a time. The output of the network is fed back into the input on the next iteration.

Here's a diagram of my network:

+--------------------------------+
|                                |
|    In:  H E L L O   W O R L <--+-----+
|         | | | | | | | | | |    |     |
|         V V V V V V V V V V    |     | Recursive feed
|         +-----------------+    |     |
+-> Hin ->|  RNN + Softmax  |-> Hout   |
          +-----------------+          |
          | | | | | | | | | |          |
     Out: V V V V V V V V V V          |
          E L L O   W O R L D ---------+
                            ^ Character predicted by the network

I expect the network to at least do the copying bit correctly. Unfortunately my network always outputs 32 for all values (ASCII space character). I'm not sure what is causing the issue...

Please help me get my network producing poetry!

My code is here: https://github.com/calebh/namepoet/blob/03f112ced94c3319055fbcc74a2acdb4a9b0d41c/main.py

The corpus can be replaced by a few paragraphs of Lorem Ipsum to speed up training (the network has the same bad behavior).

回答1:

Sounds like it might be saturating your filters (ergo the activation function is way at the far end of the spectrum and thus has a very low gradient and gets stuck). You might want to try initializing your neurons parameters with a different method.

Also, is there a particular reason you're using GRU? In my experience LSTM units are more reliable, if a bit less efficient.

回答2:

I'd try to run the code for longer time? You have batch_size = 10, sequence_size = 30 and 20 iterations, your network essentially has seen 6000 characters in total, maybe with a learning rate of 0.001, it wasn't enough to move away from your initialization.

Hence, I'd try to raise the learning rate to a very high value (e.g. 1 or 100) and see if it starts outputting different letters to confirm your implementation is somewhat correct. A network trained with such a high learning rate is usually not going to be accurate at all.

来源：https://stackoverflow.com/questions/41557405/nlp-recurrent-neural-network-always-gives-constant-values

标签

machine-learning

tensorflow

nlp