My machine has the following spec:
CPU: Xeon E5-1620 v4
GPU: Titan X (Pascal)
Ubuntu 16.04
Nvidia driver 375.26
CUDA tookit 8.0
If you use Keras, use CuDNNLSTM in place of LSTM or CuDNNGRU in place of GRU. In my case (2 Tesla M60), I am seeing 10x boost of performance. By the way I am using batch size 128 as suggested by @Alexey Golyshev.
It's just a tip.
Using GPU is powerful when
1. your neural network model is big.
2. batch size is big.
It's what I found from googling.
Too small batch size. Try to increase.
Results for my GTX1050Ti:
imdb_bidirectional_lstm.py batch_size time 32 (default) 252 64 131 96 87 128 66 imdb_lstm.py batch_size time 32 (default) 108 64 50 96 34 128 25
I have got similar issues here:
CPU: Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz
Ubuntu 14.04
imdb_bidirectional_lstm.py
: 155s
GPU: GTX 860m
Nvidia Driver: 369.30
CUDA Toolkit: v8.0
cuDNN: v6.0
imdb_bidirectional_lstm.py
:450s
When I observe the GPU load curve, I found one interesting thing:
GPU load
This is mainly due to the sequential computation in LSTM layer. Remember that LSTM requires sequential input to calculate hidden layer weights iteratively, in other words, you must wait for hidden state at time t-1
to calculate hidden state at time t
.
That's not a good idea for GPU cores, since they are many small cores who like doing computations in parallel, sequential compuatation can't fully utilize their computing powers. That's why we are seeing GPU load around 10% - 20% most of the time.
But in the phase of backpropagation, GPU could run derivative computation in parallel, so we can see GPU load peak around 80%.