I am starting with some python task, I am facing a problem while using gensim. I am trying to load files from my disk and process them (split them and lowercase() them)
Dictionary needs a tokenized strings for its input:
dataset = ['driving car ',
'drive car carefully',
'student and university']
# be sure to split sentence before feed into Dictionary
dataset = [d.split() for d in dataset]
vocab = Dictionary(dataset)