Keras Tokenizer num_words doesn't seem to work

后端 未结 3 1897
日久生厌
日久生厌 2020-12-15 05:17
>>> t = Tokenizer(num_words=3)
>>> l = [\"Hello, World! This is so&#$ fantastic!\", \"There is no other world like this one\"]
>>> t.f         


        
3条回答
  •  佛祖请我去吃肉
    2020-12-15 05:49

    Limiting num_words to a small number (eg, 3) has no effect on fit_on_texts outputs such as word_index, word_counts, word_docs. It does have effect on texts_to_matrix. The resulting matrix will have num_words (3) columns.

    >>> t = Tokenizer(num_words=3)
    >>> l = ["Hello, World! This is so&#$ fantastic!", "There is no other world like this one"]
    >>> t.fit_on_texts(l)
    >>> print(t.word_index)
    {'world': 1, 'this': 2, 'is': 3, 'hello': 4, 'so': 5, 'fantastic': 6, 'there': 7, 'no': 8, 'other': 9, 'like': 10, 'one': 11}
    
    >>> t.texts_to_matrix(l, mode='count')
    array([[0., 1., 1.],       
           [0., 1., 1.]])
    

提交回复
热议问题