PyTorch - contiguous() | 易学教程

I was going through this example of a LSTM language model on github (link). What it does in general is pretty clear to me. But I'm still struggling to understand what calling contiguous() does, which occurs several times in the code.

For example in line 74/75 of the code input and target sequences of the LSTM are created. Data (stored in ids) is 2-dimensional where first dimension is the batch size.

for i in range(0, ids.size(1) - seq_length, seq_length):
    # Get batch inputs and targets
    inputs = Variable(ids[:, i:i+seq_length])
    targets = Variable(ids[:, (i+1):(i+1)+seq_length].contiguous())

So as a simple example, when using batch size 1 and seq_length 10 inputs and targets looks like this:

inputs Variable containing:
0     1     2     3     4     5     6     7     8     9
[torch.LongTensor of size 1x10]

targets Variable containing:
1     2     3     4     5     6     7     8     9    10
[torch.LongTensor of size 1x10]

So in general my question is, what does contiguous() and why do I need it?

Further I don't understand why the method is called for the target sequence and but not the input sequence as both variables are comprised of the same data.

How could targets be uncontiguous and inputs still be contiguous?

EDIT: I tried to leave out calling contiguous(), but this leads to an error message when computing the loss.

RuntimeError: invalid argument 1: input is not contiguous at .../src/torch/lib/TH/generic/THTensor.c:231

So obviously calling contiguous() in this example is necessary.

(For keeping this readable I avoided posting the full code here, it can be found by using the GitHub link above.)

Thanks in advance!

Shital Shah

There are few operations on Tensor in PyTorch that do not really change the content of the tensor, but only how to convert indices in to tensor to byte location. These operations include:

narrow(), view(), expand() and transpose()

For example: when you call transpose(), PyTorch doesn't generate new tensor with new layout, it just modifies meta information in Tensor object so offset and stride are for new shape. The transposed tensor and original tensor are indeed sharing the memory!

x = torch.randn(3,2)
y = torch.transpose(x, 0, 1)
x[0, 0] = 42
print(y[0,0])
# prints 42

This is where the concept of contiguous comes in. Above x is contiguous but y is not because its memory layout is different than a tensor of same shape made from scratch. Note that the word "contiguous" is bit misleading because its not that the content of tensor is spread out around disconnected blocks of memory. Here bytes are still allocated in one block of memory but the order of the elements is different!

When you call contiguous(), it actually makes a copy of tensor so the order of elements would be same as if tensor of same shape created from scratch.

Normally you don't need to worry about this. If PyTorch expects contiguous tensor but if its not then you will get RuntimeError: input is not contiguous and then you just add a call to contiguous().

From the pytorch documentation:

contiguous() → Tensor
Returns a contiguous tensor containing the same data as self 
tensor. If self tensor is contiguous, this function returns the self tensor.

Where contiguous here means contiguous in memory. So the contiguous function doesn't affect your target tensor at all, it just makes sure that it is stored in a contiguous chunk of memory.

As in the previous answer contigous() allocates contigous memory chunks, it'll be helpful when we're passing tensor to c or c++ backend code where tensors are passed as pointers

tensor.contiguous() will create a copy of the tensor, and the element in the copy will be stored in the memory in a contiguous way. The contiguous() function is usually required when we first transpose() a tensor and then reshape (view) it. First, let's create a contiguous tensor:

aaa = torch.Tensor( [[1,2,3],[4,5,6]] )
print(aaa.stride())
print(aaa.is_contiguous())
#(3,1)
#True

The stride() return (3,1) means that: when moving along the first dimension by each step (row by row), we need to move 3 steps in the memory. When moving along the second dimension (column by column), we need to move 1 step in the memory. This indicates that the elements in the tensor are stored contiguously.

Now we try apply come functions to the tensor:

bbb = aaa.transpose(0,1)
print(bbb.stride())
print(bbb.is_contiguous())

ccc = aaa.narrow(1,1,2)   ## equivalent to matrix slicing aaa[:,1:3]
print(ccc.stride())
print(ccc.is_contiguous())


ddd = aaa.repeat(2,1 )   # The first dimension repeat once, the second dimension repeat twice
print(ddd.stride())
print(ddd.is_contiguous())

## expand is different from repeat  if a tensor has a shape [d1,d2,1], it can only be expanded using "expand(d1,d2,d3)", which
## means the singleton dimension is repeated d3 times
eee = aaa.unsqueeze(2).expand(2,3,3)
print(eee.stride())
print(eee.is_contiguous())

fff = aaa.unsqueeze(2).repeat(1,1,8).view(2,-1,2)
print(fff.stride())
print(fff.is_contiguous())

#(1, 3)
#False
#(3, 1)
#False
#(3, 1)
#True
#(3, 1, 0)
#False
#(24, 2, 1)
#True

Ok, we can find that transpose(), narrow() and tensor slicing, and expand() will make the generated tensor not contiguous. Interestingly, repeat() and view() does not make it discontiguous. So now the question is: what happens if I use a discontiguous tensor?

The answer is it the view() function cannot be applied to a discontiguous tensor. This is probably because view() requires that the tensor to be contiguously stored so that it can do fast reshape in memory. e.g:

bbb.view(-1,3)

we will get the error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-63-eec5319b0ac5> in <module>()
----> 1 bbb.view(-1,3)

RuntimeError: invalid argument 2: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Call .contiguous() before .view(). at /pytorch/aten/src/TH/generic/THTensor.cpp:203

To solve this, simply add contiguous() to a discontiguous tensor, to create contiguous copy and then apply view()

bbb.contiguous().view(-1,3)
#tensor([[1., 4., 2.],
        [5., 3., 6.]])

来源：https://stackoverflow.com/questions/48915810/pytorch-contiguous

标签

neural-network

deep-learning

lstm

pytorch

contiguous