python-wordcloud词云练习

主要内容：

1、配置环境（win 7(64位) + pycharm + anaconda(python3.5)）

2、如何生成词云？

说在前面，以下内容资料已打包，网址为：https://pan.baidu.com/s/1WzrE7oNsyVTlv2LX93A8lQ

具体内容：

一、环境的配置

1、安装wordcloud

http://www.lfd.uci.edu/~gohlke/pythonlibs/#wordcloud

pip install wordcloud-1.3.1-cp35-cp35m-win_amd64.whl 命令开始安装，很快就安装成功了。效果如下：

2、检测是否安装成功

如果成功了，转下一步。

二、词云生成练习

词云代码下载

wordcloud的基本使用。https://amueller.github.io/word_cloud/是演示示例的代码下载地址。下面是wordcloud中类WordCloud的介绍，具体可以查看wordcloud.py文件，下载地址：词云练习。

（1）wordcloud中的WordCloud类

class WordCloud(object):     """Word cloud object for generating and drawing.      Parameters     ----------     font_path : string         Font path to the font that will be used (OTF or TTF).         Defaults to DroidSansMono path on a Linux machine. If you are on         another OS or don't have this font, you need to adjust this path.      width : int (default=400)         Width of the canvas.      height : int (default=200)         Height of the canvas.      prefer_horizontal : float (default=0.90)         The ratio of times to try horizontal fitting as opposed to vertical.         If prefer_horizontal < 1, the algorithm will try rotating the word         if it doesn't fit. (There is currently no built-in way to get only         vertical words.)      mask : nd-array or None (default=None)         If not None, gives a binary mask on where to draw words. If mask is not         None, width and height will be ignored and the shape of mask will be         used instead. All white (#FF or #FFFFFF) entries will be considerd         "masked out" while other entries will be free to draw on. [This         changed in the most recent version!]      scale : float (default=1)         Scaling between computation and drawing. For large word-cloud images,         using scale instead of larger canvas size is significantly faster, but         might lead to a coarser fit for the words.      min_font_size : int (default=4)         Smallest font size to use. Will stop when there is no more room in this         size.      font_step : int (default=1)         Step size for the font. font_step > 1 might speed up computation but         give a worse fit.      max_words : number (default=200)         The maximum number of words.      stopwords : set of strings or None         The words that will be eliminated. If None, the build-in STOPWORDS         list will be used.      background_color : color value (default="black")         Background color for the word cloud image.      max_font_size : int or None (default=None)         Maximum font size for the largest word. If None, height of the image is         used.      mode : string (default="RGB")         Transparent background will be generated when mode is "RGBA" and         background_color is None.      relative_scaling : float (default=.5)         Importance of relative word frequencies for font-size.  With         relative_scaling=0, only word-ranks are considered.  With         relative_scaling=1, a word that is twice as frequent will have twice         the size.  If you want to consider the word frequencies and not only         their rank, relative_scaling around .5 often looks good.          .. versionchanged: 2.0             Default is now 0.5.      color_func : callable, default=None         Callable with parameters word, font_size, position, orientation,         font_path, random_state that returns a PIL color for each word.         Overwrites "colormap".         See colormap for specifying a matplotlib colormap instead.      regexp : string or None (optional)         Regular expression to split the input text into tokens in process_text.         If None is specified, ``r"\w[\w']+"`` is used.      collocations : bool, default=True         Whether to include collocations (bigrams) of two words.          .. versionadded: 2.0      colormap : string or matplotlib colormap, default="viridis"         Matplotlib colormap to randomly draw colors from for each word.         Ignored if "color_func" is specified.          .. versionadded: 2.0      normalize_plurals : bool, default=True         Whether to remove trailing 's' from words. If True and a word         appears with and without a trailing 's', the one with trailing 's'         is removed and its counts are added to the version without         trailing 's' -- unless the word ends with 'ss'.      Attributes     ----------     ``words_`` : dict of string to float         Word tokens with associated frequency.          .. versionchanged: 2.0             ``words_`` is now a dictionary      ``layout_`` : list of tuples (string, int, (int, int), int, color))         Encodes the fitted word cloud. Encodes for each word the string, font         size, position, orientation and color.      Notes     -----     Larger canvases with make the code significantly slower. If you need a     large word cloud, try a lower canvas size, and set the scale parameter.      The algorithm might give more weight to the ranking of the words     than their actual frequencies, depending on the ``max_font_size`` and the     scaling heuristic.     """      def __init__(self, font_path=None, width=400, height=200, margin=2,                  ranks_only=None, prefer_horizontal=.9, mask=None, scale=1,                  color_func=None, max_words=200, min_font_size=4,                  stopwords=None, random_state=None, background_color='black',                  max_font_size=None, font_step=1, mode="RGB",                  relative_scaling=.5, regexp=None, collocations=True,                  colormap=None, normalize_plurals=True):         if font_path is None:             font_path = FONT_PATH         if color_func is None and colormap is None:             # we need a color map             import matplotlib             version = matplotlib.__version__             if version[0] < "2" and version[2] < "5":                 colormap = "hsv"             else:                 colormap = "viridis"         self.colormap = colormap         self.collocations = collocations         self.font_path = font_path         self.width = width         self.height = height         self.margin = margin         self.prefer_horizontal = prefer_horizontal         self.mask = mask         self.scale = scale         self.color_func = color_func or colormap_color_func(colormap)         self.max_words = max_words         self.stopwords = stopwords if stopwords is not None else STOPWORDS         self.min_font_size = min_font_size         self.font_step = font_step         self.regexp = regexp         if isinstance(random_state, int):             random_state = Random(random_state)         self.random_state = random_state         self.background_color = background_color         self.max_font_size = max_font_size         self.mode = mode         if relative_scaling < 0 or relative_scaling > 1:             raise ValueError("relative_scaling needs to be "                              "between 0 and 1, got %f." % relative_scaling)         self.relative_scaling = relative_scaling         if ranks_only is not None:             warnings.warn("ranks_only is deprecated and will be removed as"                           " it had no effect. Look into relative_scaling.",                           DeprecationWarning)         self.normalize_plurals = normalize_plurals

2、演示示例1

#!/usr/bin/env python """ Colored by Group Example ========================  Generating a word cloud that assigns colors to words based on a predefined mapping from colors to words """  from wordcloud import (WordCloud, get_single_color_func) import matplotlib.pyplot as plt   class SimpleGroupedColorFunc(object):     """Create a color function object which assigns EXACT colors        to certain words based on the color to words mapping         Parameters        ----------        color_to_words : dict(str -> list(str))          A dictionary that maps a color to the list of words.         default_color : str          Color that will be assigned to a word that's not a member          of any value from color_to_words.     """      def __init__(self, color_to_words, default_color):         self.word_to_color = {word: color                               for (color, words) in color_to_words.items()                               for word in words}          self.default_color = default_color      def __call__(self, word, **kwargs):         return self.word_to_color.get(word, self.default_color)   class GroupedColorFunc(object):     """Create a color function object which assigns DIFFERENT SHADES of        specified colors to certain words based on the color to words mapping.         Uses wordcloud.get_single_color_func         Parameters        ----------        color_to_words : dict(str -> list(str))          A dictionary that maps a color to the list of words.         default_color : str          Color that will be assigned to a word that's not a member          of any value from color_to_words.     """      def __init__(self, color_to_words, default_color):         self.color_func_to_words = [             (get_single_color_func(color), set(words))             for (color, words) in color_to_words.items()]          self.default_color_func = get_single_color_func(default_color)      def get_color_func(self, word):         """Returns a single_color_func associated with the word"""         try:             color_func = next(                 color_func for (color_func, words) in self.color_func_to_words                 if word in words)         except StopIteration:             color_func = self.default_color_func          return color_func      def __call__(self, word, **kwargs):         return self.get_color_func(word)(word, **kwargs)   text = """The Zen of Python, by Tim Peters Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those!"""  # Since the text is small collocations are turned off and text is lower-cased wc = WordCloud(collocations=False).generate(text.lower())  color_to_words = {     # words below will be colored with a green single color function     '#00ff00': ['beautiful', 'explicit', 'simple', 'sparse',                 'readability', 'rules', 'practicality',                 'explicitly', 'one', 'now', 'easy', 'obvious', 'better'],     # will be colored with a red single color function     'red': ['ugly', 'implicit', 'complex', 'complicated', 'nested',             'dense', 'special', 'errors', 'silently', 'ambiguity',             'guess', 'hard'] }  # Words that are not in any of the color_to_words values # will be colored with a grey single color function default_color = 'grey'  # Create a color function with single tone # grouped_color_func = SimpleGroupedColorFunc(color_to_words, default_color)  # Create a color function with multiple tones grouped_color_func = GroupedColorFunc(color_to_words, default_color)  # Apply our color function wc.recolor(color_func=grouped_color_func)  # Plot plt.figure() plt.imshow(wc, interpolation="bilinear") plt.axis("off") plt.show()

运行结果：

3、利用背景图片生成词云，设置停用词

输入：alice_color.png、alice.txt

输出：词云图像

（1）输入――alice_color.png

（2）输入――alice.txt（由于太长了，这里不显示）

（3）输出――词云图像

代码：

from os import path from PIL import Image import numpy as np import matplotlib.pyplot as plt  from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator  d = path.dirname(__file__)  # Read the whole text. text = open(path.join(d, 'alice.txt')).read()  # read the mask / color image taken from # http://jirkavinse.deviantart.com/art/quot-Real-Life-quot-Alice-282261010 alice_coloring = np.array(Image.open(path.join(d, "alice_color.png"))) stopwords = set(STOPWORDS) stopwords.add("said")  wc = WordCloud(background_color="white", max_words=2000, mask=alice_coloring,                stopwords=stopwords, max_font_size=40, random_state=42) # generate word cloud wc.generate(text)  # create coloring from image image_colors = ImageColorGenerator(alice_coloring)  # show plt.imshow(wc, interpolation="bilinear") plt.axis("off") plt.figure() wc.to_file(path.join(d, 'res.png')) # recolor wordcloud and show # we could also give color_func=image_colors directly in the constructor plt.imshow(wc.recolor(color_func=image_colors), interpolation="bilinear") wc.to_file(path.join(d, 'res1.png')) plt.axis("off") plt.figure() plt.imshow(alice_coloring, cmap=plt.cm.gray, interpolation="bilinear") plt.axis("off") wc.to_file(path.join(d, 'res2.png')) plt.show()

4、使用遮罩，生成任意形状的词云

from os import path from PIL import Image import numpy as np import matplotlib.pyplot as plt  from wordcloud import WordCloud, STOPWORDS  d = path.dirname(__file__)  # Read the whole text. text = open(path.join(d, 'alice.txt')).read()  alice_mask = np.array(Image.open(path.join(d, "alice_mask.png")))  stopwords = set(STOPWORDS) stopwords.add("said")  wc = WordCloud(background_color="white", max_words=2000, mask=alice_mask,                stopwords=stopwords) # generate word cloud wc.generate(text)  # store to file wc.to_file(path.join(d, "alice.png"))  # show  plt.imshow(wc, interpolation='bilinear') plt.axis("off") plt.figure() plt.imshow(alice_mask, cmap=plt.cm.gray, interpolation='bilinear') plt.axis("off") plt.show()

运行结果：

三、实战练习

Deep learning has been applied to saliency detection in recent years. The superior performance has proved that deep networks can model the semantic properties of salient objects. Yet it is difficult for a deep network to discriminate pixels belonging to similar receptive fields around the object boundaries, thus deep networks may output maps with blurred saliency and inaccurate boundaries. To tackle such an issue, in this work, we propose a deep Level Set network to produce compact and uniform saliency maps. Our method drives the network to learn a Level Set function for salient objects so it can output more accurate boundaries and compact saliency. Besides, to propagate saliency information among pixels and recover full resolution saliency map, we extend a superpixel-based guided filter to be a layer in the network. The proposed network has a simple structure and is trained end-to-end. During testing, the network can produce saliency maps by efficiently feedforwarding testing images at a speed over 12FPS on GPUs. Evaluations on benchmark datasets show that the proposed method achieves state-of-the-art performance

test.png为：

代码：

from os import path from PIL import Image import numpy as np import matplotlib.pyplot as plt  from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator  d = path.dirname(__file__)  text = open(path.join(d, 'test.txt')).read()  alice_coloring = np.array(Image.open(path.join(d, "test.png"))) stopwords = set(STOPWORDS) stopwords.add("said")  wc = WordCloud(background_color="white", max_words=2000, mask=alice_coloring,                stopwords=stopwords, max_font_size=40, random_state=42) # generate word cloud wc.generate(text)  # create coloring from image image_colors = ImageColorGenerator(alice_coloring)  # show plt.imshow(wc, interpolation="bilinear") plt.axis("off") plt.figure() wc.to_file(path.join(d, 'resTest.png')) # recolor wordcloud and show # we could also give color_func=image_colors directly in the constructor plt.imshow(wc.recolor(color_func=image_colors), interpolation="bilinear") wc.to_file(path.join(d, 'resTest2.png')) plt.axis("off")

运行结果：

2、中文词云（详见：https://blog.csdn.net/cskywit/article/details/79285988）

具体例子见：https://blog.csdn.net/xiaogejun/article/details/73997633

文章来源: python-wordcloud词云练习

标签

font