import torch,ipdb
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable
Answer by cdo256 is almost correct. He is mistaken when referring to what hidden_size means. He explains it as:
hidden_size - the number of LSTM blocks per layer.
but really, here is a better explanation:
Each sigmoid, tanh or hidden state layer in the cell is actually a set of nodes, whose number is equal to the hidden layer size. Therefore each of the “nodes” in the LSTM cell is actually a cluster of normal neural network nodes, as in each layer of a densely connected neural network. Hence, if you set hidden_size = 10, then each one of your LSTM blocks, or cells, will have neural networks with 10 nodes in them. The total number of LSTM blocks in your LSTM model will be equivalent to that of your sequence length.
This can be seen by analyzing the differences in examples between nn.LSTM and nn.LSTMCell:
https://pytorch.org/docs/stable/nn.html#torch.nn.LSTM
and
https://pytorch.org/docs/stable/nn.html#torch.nn.LSTMCell