Python equivalent of daisy() in the cluster package of R

放肆的年华 提交于 2019-12-02 19:30:53
ely

I believe you are looking for scipy.spatial.distance.pdist.

If you implement a function that computes the Gower distance on a single pair of observations, you can pass that function to pdist and it will apply it pairwise and return the resulting matrix of pairwise distances. It does not appear that the Gower distance is one of the built-in options.

Likewise, if a single observation has mixed attributes, you can just define your own function which, say, uses something like the Euclidean distance on the subset of numerical attributes, a Gower distance on the subset of categorical attributes, and adds them -- or any other implementation of what it means to you, for your application, to compute the distance between two isolated observations.

For clustering in Python, usually you want to work with scikits.learn and this question and answer page discusses exactly this problem of using a custom distance measure (in your case Gower) with scikits -- which does not appear possible.

You could use one of the choices provided by pdist along with the implementation at that linked answer page -- or you could implement a function for the Gower similarity and use that. But if you want the out-of-the-box clustering tools from scikits, it does not appear to be directly possible.

Just to implement a Gower function to use with pdist won´t be enough.

Internally the pdist makes several numerical transformations that will fail if you use a matrix with mixed data.

I implemented the Gower function, according the original paper, and the respective adptations necessary in the pdist module (I could not simply override the functions, because the defs in the pdist module are private).

The results I obtained with this so far are the same from R´s daisy function.

The source code is avilable at this jupyter notebook: https://sourceforge.net/projects/gower-distance-4python/files/

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!