Forming Bigrams of words in list of sentences with Python

前端 未结 10 1453
遇见更好的自我
遇见更好的自我 2020-12-24 02:16

I have a list of sentences:

text = [\'cant railway station\',\'citadel hotel\',\' police stn\']. 

I need to form bigram pairs and store the

10条回答
  •  -上瘾入骨i
    2020-12-24 03:04

    Just fixing Dan's code:

    def get_bigrams(myString):
        tokenizer = WordPunctTokenizer()
        tokens = tokenizer.tokenize(myString)
        stemmer = PorterStemmer()
        bigram_finder = BigramCollocationFinder.from_words(tokens)
        bigrams = bigram_finder.nbest(BigramAssocMeasures.chi_sq, 500)
    
        for bigram_tuple in bigrams:
            x = "%s %s" % bigram_tuple
            tokens.append(x)
    
        result = [' '.join([stemmer.stem(w).lower() for w in x.split()]) for x in tokens if x.lower() not in stopwords.words('english') and len(x) > 8]
        return result
    

提交回复
热议问题