Generating Ngrams (Unigrams,Bigrams etc) from a large corpus of .txt files and their Frequency

后端 未结 6 1720
孤街浪徒
孤街浪徒 2020-12-12 23:12

I need to write a program in NLTK that breaks a corpus (a large collection of txt files) into unigrams, bigrams, trigrams, fourgrams and fivegrams. I have already written co

6条回答
  •  一整个雨季
    2020-12-12 23:40

    maybe it helps. see link

    import spacy  
    nlp_en = spacy.load("en_core_web_sm")
    [x.text for x in doc]
    

提交回复
热议问题