Computing N Grams using Python

后端 未结 8 1879
情歌与酒
情歌与酒 2020-11-28 06:02

I needed to compute the Unigrams, BiGrams and Trigrams for a text file containing text like:

\"Cystic fibrosis affects 30,000 children and young adults in the US a

8条回答
  •  伪装坚强ぢ
    2020-11-28 06:23

    There is one more interesting module into python called Scikit. Here is the code. This will help u to get all the grams given in a particular range. Here is the code

    from sklearn.feature_extraction.text import CountVectorizer 
    text = "this is a foo bar sentences and i want to ngramize it"
    vectorizer = CountVectorizer(ngram_range=(1,6))
    analyzer = vectorizer.build_analyzer()
    print analyzer(text)
    

    Output is

    [u'this', u'is', u'foo', u'bar', u'sentences', u'and', u'want', u'to', u'ngramize', u'it', u'this is', u'is foo', u'foo bar', u'bar sentences', u'sentences and', u'and want', u'want to', u'to ngramize', u'ngramize it', u'this is foo', u'is foo bar', u'foo bar sentences', u'bar sentences and', u'sentences and want', u'and want to', u'want to ngramize', u'to ngramize it', u'this is foo bar', u'is foo bar sentences', u'foo bar sentences and', u'bar sentences and want', u'sentences and want to', u'and want to ngramize', u'want to ngramize it', u'this is foo bar sentences', u'is foo bar sentences and', u'foo bar sentences and want', u'bar sentences and want to', u'sentences and want to ngramize', u'and want to ngramize it', u'this is foo bar sentences and', u'is foo bar sentences and want', u'foo bar sentences and want to', u'bar sentences and want to ngramize', u'sentences and want to ngramize it']
    

    Here it gives all the grams given in a range 1 to 6. Its using the method called countVectorizer. Here is the link for that.

提交回复
热议问题