I implemented multi layer LSTM, however the result is different from nn.LSTM if init_state is not None. I suspect I may do something wrong in forward function. Any help will