Visualize data and clustering [closed]

问题

i am currently writing a python script to find the similarity between documents.I have already calculated the similarities score for each document pairs and store them in dictionaries. It looks something like this:

{(8328, 8327): 1.0, (8313, 8306): 0.12405229825691289, (8329, 8328): 1.0, (8322, 8321): 0.99999999999999989, (8328, 8329): 1.0, (8306, 8316): 0.12405229825691289, (8320, 8319): 0.67999999999999989, (8337, 8336): 1.0000000000000002, (8319, 8320): 0.67999999999999989, (8313, 8316): 0.99999999999999989, (8321, 8322): 0.99999999999999989, (8330, 8328): 1.0}

My final goal is to cluster the similar documents together. The data above can be viewed in another way. Let's say the document pair (8313,8306). The similarity score is 0.12405. I can specified that the inverse of the score will be the distance between document 8313 and 8306. Therefore, similar documents will cluster closer together while not-so-similar documents will be further apart based on their distance.

My question is, IS there any open source visualization tool that can help me to achieve this?

回答1:

I'm not sure what the term for that type of graph would be (minimum weight spanning tree?), but check out Graphviz. There are some Python bindings for it as well, but failing that you could simply generate an input file for it, or pipe data directly in.

回答2:

I think you have to use MDS

http://en.wikipedia.org/wiki/Multidimensional_scaling

回答3:

I think Weka can do this. You might have to massage the input file to a different format first. Weka also has an API, though it's in Java, not Python.

回答4:

There are lots of tools you can use to do this.

There have been other mentions, but you could fairly easily do something like this in Tkinter, PyGTK+, PyQT, matplotlib, or really any graphical lib.

However, a polar plot in matplotlib would be fairly simple:

(untested):

import math
from matplotlib.pyplot import figure, show

# assign your data here
fig = figure()
ax = fig.add_subplot(111, polar=True)

for pair in data:
    ax.plot(0, data[pair], 'o')
show()

That should give you a rudimentary visualization. You could also change it around to

ax.plot(pair*math.pi, 1, 'o')

For a different style of visualization.

The matplotlib docs are very good and they have plenty of examples.

回答5:

Maybe Networkx may help. This example could be a good starting point:

http://networkx.lanl.gov/examples/drawing/knuth_miles.html

来源：https://stackoverflow.com/questions/3240658/visualize-data-and-clustering

标签

python

cluster-analysis

visualization