How to create a wordcloud according to frequencies in a pandas dataframe

后端 未结 1 1626
天命终不由人
天命终不由人 2020-12-10 19:06

I have to plot a wordcloud. \'tweets.csv\' is a Pandas dataframe which has a column named \'text\'. The plotted graph hasn\'t been based on the most common words, tough. How

相关标签:
1条回答
  • 2020-12-10 19:49

    Setup a Sample DataFrame:

    • Also see DataCamp: Generating WordClouds in Python
    import pandas as pd
    
    df = pd.DataFrame({'word': ['how', 'are', 'you', 'doing', 'this', 'afternoon'],
                       'count': [7, 10, 4, 1, 20, 100]}) 
    

    Convert the word & count columns to a dict

    • WordCloud().generate_from_frequencies() requires a dict
    data = dict(zip(df['word'].tolist(), df['count'].tolist()))
    
    print(data)
    
    >>> {'how': 7, 'are': 10, 'you': 4, 'doing': 1, 'this': 20, 'afternoon': 100}                                                                          
    

    Wordcloud:

    • use .generate_from_frequencies
    • generate_from_frequencies(frequencies, max_font_size=None)
    from wordcloud import WordCloud
    
    wc = WordCloud(width=800, height=400, max_words=200).generate_from_frequencies(data)
    

    Plot

    import matplotlib.pyplot as plt
    
    plt.figure(figsize=(10, 10))
    plt.imshow(wc, interpolation='bilinear')
    plt.axis('off')
    plt.show()
    

    Using an image mask:

    twitter_mask = np.array(Image.open('twitter.png'))
    wc = WordCloud(background_color='white', width=800, height=400, max_words=200, mask=twitter_mask).generate_from_frequencies(data_nyt)
    
    plt.figure(figsize=(10, 10))
    plt.imshow(wc, interpolation='bilinear')
    plt.axis("off")
    plt.figure()
    plt.imshow(twitter_mask, cmap=plt.cm.gray, interpolation='bilinear')
    plt.axis("off")
    plt.show()
    

    0 讨论(0)
提交回复
热议问题