I\'m trying to perform clustering in Python using Random Forests. In the R implementation of Random Forests, there is a flag you can set to get the proximity matrix. I can\'
We don't implement proximity matrix in Scikit-Learn (yet).
However, this could be done by relying on the apply
function provided in our implementation of decision trees. That is, for all pairs of samples in your dataset, iterate over the decision trees in the forest (through forest.estimators_
) and count the number of times they fall in the same leaf, i.e., the number of times apply
give the same node id for both samples in the pair.
Hope this helps.