How to find the count of a word in a string?

后端 未结 9 957
北恋
北恋 2020-12-01 13:07

I have a string \"Hello I am going to I with hello am\". I want to find how many times a word occur in the string. Example hello occurs 2 time. I tried this app

9条回答
  •  夕颜
    夕颜 (楼主)
    2020-12-01 13:28

    The vector of occurrence counts of words is called bag-of-words.

    Scikit-learn provides a nice module to compute it, sklearn.feature_extraction.text.CountVectorizer. Example:

    import numpy as np
    from sklearn.feature_extraction.text import CountVectorizer
    
    vectorizer = CountVectorizer(analyzer = "word",   \
                                 tokenizer = None,    \
                                 preprocessor = None, \
                                 stop_words = None,   \
                                 min_df = 0,          \
                                 max_features = 50) 
    
    text = ["Hello I am going to I with hello am"]
    
    # Count
    train_data_features = vectorizer.fit_transform(text)
    vocab = vectorizer.get_feature_names()
    
    # Sum up the counts of each vocabulary word
    dist = np.sum(train_data_features.toarray(), axis=0)
    
    # For each, print the vocabulary word and the number of times it 
    # appears in the training set
    for tag, count in zip(vocab, dist):
        print count, tag
    

    Output:

    2 am
    1 going
    2 hello
    1 to
    1 with
    

    Part of the code was taken from this Kaggle tutorial on bag-of-words.

    FYI: How to use sklearn's CountVectorizerand() to get ngrams that include any punctuation as separate tokens?

提交回复
热议问题