一个self attention的pytorch实现

匿名 (未验证) 提交于 2019-12-03 00:03:02
class SelfAttention(nn.Module):     """     scores each element of the sequence with a linear layer and uses the normalized scores to compute a context over the sequence.     """      def __init__(self, d_hid, dropout=0.):         super().__init__()         self.scorer = nn.Linear(d_hid, 1)         self.dropout = nn.Dropout(dropout)      def forward(self, input_seq, lens):         batch_size, seq_len, feature_dim = input_seq.size()         input_seq = self.dropout(input_seq)         scores = self.scorer(input_seq.contiguous().view(-1, feature_dim)).view(batch_size, seq_len)         max_len = max(lens)         for i, l in enumerate(lens):             if l < max_len:                 scores.data[i, l:] = -np.inf         scores = F.softmax(scores, dim=1)         context = scores.unsqueeze(2).expand_as(input_seq).mul(input_seq).sum(1)         return context # 既然命名为context就应该是整句的表示 

输入是[batch_size, seq_len, feature_dim]
输出是[batch_size, feature_dim]

而transformer里的multihead_attention在memory为None时也就成self attention之输出是[batch_size, seq_len, feature_dim]

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!