Confusion in understanding the output of BERTforTokenClassification class from Transformers library

旧巷老猫 提交于 2021-01-28 19:04:01

问题


It is the example given in the documentation of transformers pytorch library

from transformers import BertTokenizer, BertForTokenClassification
import torch

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForTokenClassification.from_pretrained('bert-base-uncased', 
                      output_hidden_states=True, output_attentions=True)

input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", 
                         add_special_tokens=True)).unsqueeze(0)  # Batch size 1
labels = torch.tensor([1] * input_ids.size(1)).unsqueeze(0)  # Batch size 1
outputs = model(input_ids, labels=labels)

loss, scores, hidden_states,attentions = outputs

Here hidden_states is a tuple of length 13 and contains hidden-states of the model at the output of each layer plus the initial embedding outputs. I would like to know, whether hidden_states[0] or hidden_states[12] represent the final hidden state vectors?


回答1:


If you check the source code, specifically BertEncoder, you can see that the returned states are initialized as an empty tuple and then simply appended per iteration of each layer.

The final layer is appended as the last element after this loop, see here, so we can safely assume that hidden_states[12] is the final vectors.



来源:https://stackoverflow.com/questions/60847291/confusion-in-understanding-the-output-of-bertfortokenclassification-class-from-t

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!