It seems to be possible by using simple UNIX command line tools to extract the text contents of those documents into text files, then using a pure Python solution for the actual clustering.
I found a code snippet for clustering data in general:
http://www.daniweb.com/code/snippet216641.html
A Python package for this:
http://python-cluster.sourceforge.net/
Another python package (used mainly for bioinformatics):
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/software.htm#pycluster