I\'ve ran the brown-clustering algorithm from https://github.com/percyliang/brown-cluster and also a python implementation https://github.com/mheilman/tan-clustering. And th
The integers are counts of how many times the word is seen in the document. (I have tested this in the python implementation.)
From the comments at the top of the python implementation:
Instead of using a window (e.g., as in Brown et al., sec. 4), this code computed PMI using the probability that two randomly selected clusters from the same document will be c1 and c2. Also, since the total numbers of cluster tokens and pairs are constant across pairs, this code use counts instead of probabilities.
From the code in the python implementation we see that it outputs the word, the bit string and the word counts.
def save_clusters(self, output_path):
with open(output_path, 'w') as f:
for w in self.words:
f.write("{}\t{}\t{}\n".format(w, self.get_bitstring(w),
self.word_counts[w]))