Algorithm to implement a word cloud like Wordle

五迷三道 提交于 2019-12-17 04:09:25

问题


Context

  • Take a look at Wordle: http://www.wordle.net/
  • It's much better looking than any other word cloud generators I've seen
  • Note: the source is not available - read the FAQ: http://www.wordle.net/faq#code

My Questions

  • Is there an algorithm available that does what Wordle does?
  • If no, what are some alternatives that produces similar kinds of output?

Why I'm asking

  • just curious
  • want to learn

回答1:


I'm the creator of Wordle. Here's how Wordle actually works:

Count the words, throw away boring words, and sort by the count, descending. Keep the top N words for some N. Assign each word a font size proportional to its count. Generate a Java2D Shape for each word, using the Java2D API.

Each word "wants" to be somewhere, such as "at some random x position in the vertical center". In decreasing order of frequency, do this for each word:

place the word where it wants to be
while it intersects any of the previously placed words
    move it one step along an ever-increasing spiral

That's it. The hard part is in doing the intersection-testing efficiently, for which I use last-hit caching, hierarchical bounding boxes, and a quadtree spatial index (all of which are things you can learn more about with some diligent googling).

Edit: As Reto Aebersold pointed out, there's now a book chapter, freely available, that covers this same territory: Beautiful Visualization, Chapter 3: Wordle




回答2:


I've implemented an algorithm as described by Jonathan Feinberg using python to create a tag cloud. It is far away from the beautiful clouds of wordle.net but it gives you an idea how it could be done.

You can find the project here.




回答3:


Here's a really nice javascript one from Jason Davies that uses d3. You can even use webfonts with it.

Demo: http://www.jasondavies.com/wordcloud/

Github: https://github.com/jasondavies/d3-cloud




回答4:


I've created a Silverlight component that uses the algorithm Jonathan suggests here. The source code and example projects are all available on my blog:

http://whydoidoit.com

My cloud lets you color and size words based on different weightings and it supports word selection (from a coordinate) and selected word highlighting. The source is yours to use as you see fit.




回答5:


I'm working on WordCram, a Processing library for making word clouds. It's pretty heavily influenced by Wordle, and is informed by the same PDF aeby linked to above. It handles the collision detection for you, and lets you focus on how you want your words laid out, colored, rotated, etc.




回答6:


http://code.google.com/apis/visualization/documentation/gallery.html

Check out the word cloud visualization. Not as fancy as wordle.net but real easy to add to your site.




回答7:


I was looking for a wordle-like visualization which would allow to assign color, initial position and size of a String related to other data, such as the relevance within a text - didn't find anything, but thanks to the information I found here (Especially Jonathan's explanation and aeby's link), I could finally implement 'Cloudio', which comes relatively close to wordle (at least I think so...) and offers the features I was looking for.

It is implemented with SWT and JFace, and I tried to integrate it into the MVC-model of JFace, such that you can set content- and label-providers to modify the layout of a cloud and add it to other Eclipse-plugins or RCP apps. You can also modify the way the initial position of a string is calculated, such that is not difficult to use it for cluster visualization or else. It is still poorly documented and limited in some ways (and I did the initial upload a few hours ago, so it might still be a bit buggy), but if you're interested, here's the link:

And here's a link to some created clouds, in case you want a quick impression: https://github.com/sschwieb/Cloudio/wiki/Example-Clouds

Cheers, Stephan




回答8:


Here see my implementation of Wordle like cloud. It uses the same spiral algorithm and the QuadTree data structure.

http://sourcecodecloud.codeplex.com

or

http://www.codeproject.com/Articles/224231/Word-Cloud-Tag-Cloud-Generator-Control-for-NET-Win




回答9:


Lion and Lamb is an open-source iOS app that creates word clouds using the most frequent words from a chosen book of the Bible.

It's based on the algorithm as described by Jonathan Feinberg. Hit testing does utilize a quad tree, but the bounding boxes are based on the glyph's bounding rectangle. I want to break the glyph down into many smaller bounding rects to enable word placement within a glyph's bounding box.

GitHub: https://github.com/PetahChristian/LionAndLamb




回答10:


I have a Tag Cloud generator here, which I call Disorganizer :)

Sources TagCloudService and the razor markup control and a WinForm for testing purposes that you can put in your blog, profile etc, with a little wrapper around it. It uses C# 4.0 & System.Drawing namespace heavily.

I created it because with the other cloud generators you cannot click on tags to navigate and cannot create hover animations, to show that they are clickable. Since showing hover animation in HTML is necessary for me (I'm doing this with overlay-ed, absolutely-positioned <a> tags) I haven't developed any-angle word display - they are either vertical or horizontal.

Warning :The above links may go invalid in a few months, I plan to slowly untie it from the surrounding project into a separate project.

You can see a working demo on this sample blog post, but it is incomplete, and in an incomplete site. Contact me if anyone wants to contribute, I will get on with separating it out asap.




回答11:


Here is yet another end-to-end implementation of wordle in Python 3 largely based on the initial outline by Jonathan Feinberg (QuadTrees, spirals, etc.).

The code (commented, with detailed ReadMe file) is freely available at this Github repository and this is a sample wordle created with the code.



来源:https://stackoverflow.com/questions/342687/algorithm-to-implement-a-word-cloud-like-wordle

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!