PyTorch: How to implement attention for graph attention layer

天大地大妈咪最大 提交于 2019-12-11 17:53:07

问题


I have implemented the attention (Eq. 1) of https://arxiv.org/pdf/1710.10903.pdf but it's clearly not memory efficient and can run only a single model on my GPU (it takes 7-10GB).

Currently, I have

class MyModule(nn.Module):

def __init__(self, in_features, out_features):
    super(MyModule, self).__init__()
    self.in_features = in_features
    self.out_features = out_features

    self.W = nn.Parameter(nn.init.xavier_uniform(torch.Tensor(in_features, out_features).type(torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor), gain=np.sqrt(2.0)), requires_grad=True)
    self.a = nn.Parameter(nn.init.xavier_uniform(torch.Tensor(2*out_features, 1).type(torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensor), gain=np.sqrt(2.0)), requires_grad=True)

def forward(self, input):
    h = torch.mm(input, self.W)
    N = h.size()[0]

    a_input = torch.cat([h.repeat(1, N).view(N * N, -1), h.repeat(N, 1)], dim=1).view(N, -1, 2 * self.out_features)
    e = F.elu(torch.matmul(a_input, self.a).squeeze(2))
    return e

Where my insight to compute all the e_ij terms is

In [8]: import torch

In [9]: import numpy as np

In [10]: h = torch.LongTensor(np.array([[1,1], [2,2], [3,3]]))

In [11]: N=3

In [12]: h.repeat(1, N).view(N * N, -1) Out[12]:

1     1
1     1
1     1
2     2
2     2
2     2
3     3
3     3
3     3

[torch.LongTensor of size 9x2]

In [13]: h.repeat(N, 1) Out[13]:

1     1
2     2
3     3
1     1
2     2
3     3
1     1
2     2
3     3

[torch.LongTensor of size 9x2]

And finally concatenate both hs and feed matrix a.

Is there a way to do it in a more memory-friendly way ?


回答1:


Maybe you can use sparse tensor to store adj_mat

def sparse_mx_to_torch_sparse_tensor(sparse_mx):
    """Convert a scipy sparse matrix to a torch sparse tensor."""
    sparse_mx = sparse_mx.tocoo().astype(np.float32)
    indices = torch.from_numpy(np.vstack((sparse_mx.row,
                                          sparse_mx.col))).long()
    values = torch.from_numpy(sparse_mx.data)
    shape = torch.Size(sparse_mx.shape)
    return torch.sparse.FloatTensor(indices, values, shape)


来源:https://stackoverflow.com/questions/49358396/pytorch-how-to-implement-attention-for-graph-attention-layer

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!