PCA For categorical features?

后端 未结 6 2046
清酒与你
清酒与你 2020-12-13 06:15

In my understanding, I thought PCA can be performed only for continuous features. But while trying to understand the difference between onehot encoding and label encoding ca

6条回答
  •  暗喜
    暗喜 (楼主)
    2020-12-13 07:09

    PCA is a dimensionality reduction method that can be applied any set of features. Here is an example using OneHotEncoded (i.e. categorical) data:

    from sklearn.preprocessing import OneHotEncoder
    enc = OneHotEncoder()
    X = enc.fit_transform([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]]).toarray()
    
    print(X)
    
    > array([[ 1.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  1.],
           [ 0.,  1.,  0.,  1.,  0.,  1.,  0.,  0.,  0.],
           [ 1.,  0.,  0.,  0.,  1.,  0.,  1.,  0.,  0.],
           [ 0.,  1.,  1.,  0.,  0.,  0.,  0.,  1.,  0.]])
    
    
    from sklearn.decomposition import PCA
    pca = PCA(n_components=3)
    X_pca = pca.fit_transform(X)
    
    print(X_pca)
    
    > array([[-0.70710678,  0.79056942,  0.70710678],
           [ 1.14412281, -0.79056942,  0.43701602],
           [-1.14412281, -0.79056942, -0.43701602],
           [ 0.70710678,  0.79056942, -0.70710678]])
    

提交回复
热议问题