Gensim: TypeError: doc2bow expects an array of unicode tokens on input, not a single string

前端 未结 3 764
庸人自扰
庸人自扰 2020-12-06 06:05

I am starting with some python task, I am facing a problem while using gensim. I am trying to load files from my disk and process them (split them and lowercase() them)

3条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-12-06 06:41

    Dictionary needs a tokenized strings for its input:

    dataset = ['driving car ',
               'drive car carefully',
               'student and university']
    
    # be sure to split sentence before feed into Dictionary
    dataset = [d.split() for d in dataset]
    
    vocab = Dictionary(dataset)
    

提交回复
热议问题