I was going through this example of a LSTM language model on github (link). What it does in general is pretty clear to me. But I\'m still struggling to understand what calli
From the [pytorch documentation][1]:
contiguous() → Tensor
Returns a contiguous tensor containing the same data as self
tensor. If self tensor is contiguous, this function returns the self tensor.
Where contiguous
here means not only contiguous in memory, but also in the same order in memory as the indices order: for example doing a transposition doesn't change the data in memory, it simply changes the map from indices to memory pointers, if you then apply contiguous()
it will change the data in memory so that the map from indices to memory location is the canonical one.
[1]: http://pytorch.org/docs/master/tensors.html
The accepted answers was so great, and I tried to dupe transpose()
function effect. I created the two functions that can check the samestorage()
and the contiguous
.
def samestorage(x,y):
if x.storage().data_ptr()==y.storage().data_ptr():
print("same storage")
else:
print("different storage")
def contiguous(y):
if True==y.is_contiguous():
print("contiguous")
else:
print("non contiguous")
I checked and got this result as a table:
You can review the checker code down below, but let's give one example when the tensor is non contiguous. We cannot simple call view()
on that tensor, we would need to reshape()
it or we could also call .contiguous().view()
.
x = torch.randn(3,2)
y = x.transpose(0, 1)
y.view(6) # RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
x = torch.randn(3,2)
y = x.transpose(0, 1)
y.reshape(6)
x = torch.randn(3,2)
y = x.transpose(0, 1)
y.contiguous().view(6)
Further to note there are methods that create contiguous and non contiguous tensors in the end. There are methods that can operate on a same storage, and some methods as flip()
that will create a new storage (read: clone the tensor) before return.
The checker code:
import torch
x = torch.randn(3,2)
y = x.transpose(0, 1) # flips two axes
print("\ntranspose")
print(x)
print(y)
contiguous(y)
samestorage(x,y)
print("\nnarrow")
x = torch.randn(3,2)
y = x.narrow(0, 1, 2) #dim, start, len
print(x)
print(y)
contiguous(y)
samestorage(x,y)
print("\npermute")
x = torch.randn(3,2)
y = x.permute(1, 0) # sets the axis order
print(x)
print(y)
contiguous(y)
samestorage(x,y)
print("\nview")
x = torch.randn(3,2)
y=x.view(2,3)
print(x)
print(y)
contiguous(y)
samestorage(x,y)
print("\nreshape")
x = torch.randn(3,2)
y = x.reshape(6,1)
print(x)
print(y)
contiguous(y)
samestorage(x,y)
print("\nflip")
x = torch.randn(3,2)
y = x.flip(0)
print(x)
print(y)
contiguous(y)
samestorage(x,y)
print("\nexpand")
x = torch.randn(3,2)
y = x.expand(2,-1,-1)
print(x)
print(y)
contiguous(y)
samestorage(x,y)
There are few operations on Tensor in PyTorch that do not really change the content of the tensor, but only how to convert indices in to tensor to byte location. These operations include:
narrow()
,view()
,expand()
andtranspose()
For example: when you call transpose()
, PyTorch doesn't generate new tensor with new layout, it just modifies meta information in Tensor object so offset and stride are for new shape. The transposed tensor and original tensor are indeed sharing the memory!
x = torch.randn(3,2)
y = torch.transpose(x, 0, 1)
x[0, 0] = 42
print(y[0,0])
# prints 42
This is where the concept of contiguous comes in. Above x
is contiguous but y
is not because its memory layout is different than a tensor of same shape made from scratch. Note that the word "contiguous" is bit misleading because its not that the content of tensor is spread out around disconnected blocks of memory. Here bytes are still allocated in one block of memory but the order of the elements is different!
When you call contiguous()
, it actually makes a copy of tensor so the order of elements would be same as if tensor of same shape created from scratch.
Normally you don't need to worry about this. If PyTorch expects contiguous tensor but if its not then you will get RuntimeError: input is not contiguous
and then you just add a call to contiguous()
.
As in the previous answer contigous() allocates contigous memory chunks, it'll be helpful when we're passing tensor to c or c++ backend code where tensors are passed as pointers
tensor.contiguous() will create a copy of the tensor, and the element in the copy will be stored in the memory in a contiguous way. The contiguous() function is usually required when we first transpose() a tensor and then reshape (view) it. First, let's create a contiguous tensor:
aaa = torch.Tensor( [[1,2,3],[4,5,6]] )
print(aaa.stride())
print(aaa.is_contiguous())
#(3,1)
#True
The stride() return (3,1) means that: when moving along the first dimension by each step (row by row), we need to move 3 steps in the memory. When moving along the second dimension (column by column), we need to move 1 step in the memory. This indicates that the elements in the tensor are stored contiguously.
Now we try apply come functions to the tensor:
bbb = aaa.transpose(0,1)
print(bbb.stride())
print(bbb.is_contiguous())
#(1, 3)
#False
ccc = aaa.narrow(1,1,2) ## equivalent to matrix slicing aaa[:,1:3]
print(ccc.stride())
print(ccc.is_contiguous())
#(3, 1)
#False
ffffd = aaa.repeat(2,1) # The first dimension repeat once, the second dimension repeat twice
print(ffffd.stride())
print(ffffd.is_contiguous())
#(3, 1)
#True
## expand is different from repeat.
## if a tensor has a shape [d1,d2,1], it can only be expanded using "expand(d1,d2,d3)", which
## means the singleton dimension is repeated d3 times
eee = aaa.unsqueeze(2).expand(2,3,3)
print(eee.stride())
print(eee.is_contiguous())
#(3, 1, 0)
#False
fff = aaa.unsqueeze(2).repeat(1,1,8).view(2,-1,2)
print(fff.stride())
print(fff.is_contiguous())
#(24, 2, 1)
#True
Ok, we can find that transpose(), narrow() and tensor slicing, and expand() will make the generated tensor not contiguous. Interestingly, repeat() and view() does not make it discontiguous. So now the question is: what happens if I use a discontiguous tensor?
The answer is it the view() function cannot be applied to a discontiguous tensor. This is probably because view() requires that the tensor to be contiguously stored so that it can do fast reshape in memory. e.g:
bbb.view(-1,3)
we will get the error:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-63-eec5319b0ac5> in <module>()
----> 1 bbb.view(-1,3)
RuntimeError: invalid argument 2: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Call .contiguous() before .view(). at /pytorch/aten/src/TH/generic/THTensor.cpp:203
To solve this, simply add contiguous() to a discontiguous tensor, to create contiguous copy and then apply view()
bbb.contiguous().view(-1,3)
#tensor([[1., 4., 2.],
[5., 3., 6.]])
From what I understand this a more summarized answer:
Contiguous is the term used to indicate that the memory layout of a tensor does not align with its advertised meta-data or shape information.
In my opinion the word contiguous is a confusing/misleading term since in normal contexts it means when memory is not spread around in disconnected blocks (i.e. its "contiguous/connected/continuous").
Some operations might need this contiguous property for some reason (most likely efficiency in gpu etc).
Note that .view
is another operation that might cause this issue. Look at the following code I fixed by simply calling contiguous (instead of the typical transpose issue causing it here is an example that is cause when an RNN is not happy with its input):
# normal lstm([loss, grad_prep, train_err]) = lstm(xn)
n_learner_params = xn_lstm.size(1)
(lstmh, lstmc) = hs[0] # previous hx from first (standard) lstm i.e. lstm_hx = (lstmh, lstmc) = hs[0]
if lstmh.size(1) != xn_lstm.size(1): # only true when prev lstm_hx is equal to decoder/controllers hx
# make sure that h, c from decoder/controller has the right size to go into the meta-optimizer
expand_size = torch.Size([1,n_learner_params,self.lstm.hidden_size])
lstmh, lstmc = lstmh.squeeze(0).expand(expand_size).contiguous(), lstmc.squeeze(0).expand(expand_size).contiguous()
lstm_out, (lstmh, lstmc) = self.lstm(input=xn_lstm, hx=(lstmh, lstmc))
Error I used to get:
RuntimeError: rnn: hx is not contiguous
Sources/Resource: