Proximity Matrix in sklearn.ensemble.RandomForestClassifier

后端 未结 3 1793
盖世英雄少女心
盖世英雄少女心 2020-12-28 20:01

I\'m trying to perform clustering in Python using Random Forests. In the R implementation of Random Forests, there is a flag you can set to get the proximity matrix. I can\'

3条回答
  •  醉话见心
    2020-12-28 20:54

    Based on Gilles Louppe answer I have written a function. I don't know if it is effective, but it works. Best regards.

    def proximityMatrix(model, X, normalize=True):      
    
        terminals = model.apply(X)
        nTrees = terminals.shape[1]
    
        a = terminals[:,0]
        proxMat = 1*np.equal.outer(a, a)
    
        for i in range(1, nTrees):
            a = terminals[:,i]
            proxMat += 1*np.equal.outer(a, a)
    
        if normalize:
            proxMat = proxMat / nTrees
    
        return proxMat   
    
    from sklearn.ensemble import  RandomForestClassifier
    from sklearn.datasets import load_breast_cancer
    train = load_breast_cancer()
    
    model = RandomForestClassifier(n_estimators=500, max_features=2, min_samples_leaf=40)
    model.fit(train.data, train.target)
    proximityMatrix(model, train.data, normalize=True)
    ## array([[ 1.   ,  0.414,  0.77 , ...,  0.146,  0.79 ,  0.002],
    ##        [ 0.414,  1.   ,  0.362, ...,  0.334,  0.296,  0.008],
    ##        [ 0.77 ,  0.362,  1.   , ...,  0.218,  0.856,  0.   ],
    ##        ..., 
    ##        [ 0.146,  0.334,  0.218, ...,  1.   ,  0.21 ,  0.028],
    ##        [ 0.79 ,  0.296,  0.856, ...,  0.21 ,  1.   ,  0.   ],
    ##        [ 0.002,  0.008,  0.   , ...,  0.028,  0.   ,  1.   ]])
    

提交回复
热议问题