How does LSTM convert character embedding vectors to sentence vector for sentence classification?

痴心易碎 提交于 2019-12-13 03:49:34

问题


I want to build a LSTM model for sentence classification using character embeddings.

I know how to do it using word embeddings where the model can learn the embeddings from word indexes but not sure how to do it with character embeddings.

for word embeddings:

sentence_list = ['this is a dog', 'the cat and the mouse']
label = [1,0]
word_dict = {'this':1,
             'is':2,
             'a':3,
             'dog':4,
             'the':5,
             'cat':6,
             'and':7,
             'mouse':8}

# set vector length = 9
vectors = [[1,2,3,4,0,0,0,0,0]
              [0,0,0,0,5,6,7,5,8]]
model.fit(vectors,label)

so this is ready to be fitted into a LSTM model.

How do we do it for character based vectors?

for example: if i have this character dictionary:

 char_dict = {'t':1,
             'h':2,
             'i':3,
             's':4,
             'a':5,
             'd':6,
             'o':7,
             'g':8}

How do i format this to be readable for the LSTM classification model? More specifically, how do we combine multiple character vectors to be fed into the LSTM model?


回答1:


It's exactly the same. No difference at all.

Transform the sentences into vectors of indices and go fit.

Important things:

Don't make sentences starting with 0, your vectors should be:

vectors = [[1,2,3,4,0,0,0,0,0]
          [5,6,7,5,8,0,0,0,0]]

Have indices for spaces (at least) and punctuation:

 char_dict = {'t':1,
         'h':2,
         'i':3,
         's':4,
         'a':5,
         'd':6,
         'o':7,
         'g':8
         ' ':9,
         '.':10,
         'c':11}

sentences = ['this is a dog', 'that is a cat.']
vectors = [
              [char_dict[ch] for ch in sentence] for sentence in sentences
          ]

vectors = [
              [1, 2, 3, 4, 9, 3, 4, 9, 5,  9, 6, 7,  8],
              [1, 2, 5, 1, 9, 3, 4, 9, 5, 11, 5, 1, 10]
          ]


来源:https://stackoverflow.com/questions/54904908/how-does-lstm-convert-character-embedding-vectors-to-sentence-vector-for-sentenc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!