“Correlation matrix” for strings. Similarity of nominal data

后端 未结 2 1534
我在风中等你
我在风中等你 2021-01-24 07:05

Here is my data frame. df

  store_1      store_2         store_3         store_4     

0 banana      banana           plum            banana
1 orange      ta         


        
2条回答
  •  Happy的楠姐
    2021-01-24 07:51

    If you wish to estimate the similarity of the stores with regards to their products, then you could use:

    One hot encoding

    Then each stores can be described by a vector with length of n = number of all products among all stores such as:

    banana orange apple pear plum tangerin raspberry tomato melon . . .

    Store_1 then is described as 1 1 1 1 1 0 0 0 0 0 ... Store_2 1 0 0 1 0 1 1 1 0 ...

    This leaves you with a numerical vector, where you can compute dissimilarity measure such as Euclidean Distance.

提交回复
热议问题