可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

When I am trying to work with LDA from Scikit-Learn, it keeps only giving me one component, even though I am asking for more:

>>> from sklearn.lda import LDA >>> x = np.random.randn(5,5) >>> y = [True, False, True, False, True] >>> for i in range(1,6): ...     lda = LDA(n_components=i) ...     model = lda.fit(x,y) ...     model.transform(x)

Gives

/Users/orthogonal/virtualenvs/osxml/lib/python2.7/site-packages/sklearn/lda.py:161: UserWarning: Variables are collinear   warnings.warn("Variables are collinear") array([[-0.12635305],        [-1.09293574],        [ 1.83978459],        [-0.37521856],        [-0.24527725]]) array([[-0.12635305],        [-1.09293574],        [ 1.83978459],        [-0.37521856],        [-0.24527725]]) array([[-0.12635305],        [-1.09293574],        [ 1.83978459],        [-0.37521856],        [-0.24527725]]) array([[-0.12635305],        [-1.09293574],        [ 1.83978459],        [-0.37521856],        [-0.24527725]]) array([[-0.12635305],        [-1.09293574],        [ 1.83978459],        [-0.37521856],        [-0.24527725]])

As you can see, it's only printing out one dimension each time. Why is this? Does it have anything to do with the variables being collinear?

Additionally, when I do this with Scikit-Learn's PCA, it gives me what I want.

>>> from sklearn.decomposition import PCA >>> for i in range(1,6): ...     pca = PCA(n_components=i) ...     model = pca.fit(x) ...     model.transform(x) ...  array([[ 0.83688322],        [ 0.79565477],        [-2.4373344 ],        [ 0.72500848],        [ 0.07978792]]) array([[ 0.83688322, -1.56459039],        [ 0.79565477,  0.84710518],        [-2.4373344 , -0.35548589],        [ 0.72500848, -0.49079647],        [ 0.07978792,  1.56376757]]) array([[ 0.83688322, -1.56459039, -0.3353066 ],        [ 0.79565477,  0.84710518, -1.21454498],        [-2.4373344 , -0.35548589, -0.16684946],        [ 0.72500848, -0.49079647,  1.09006296],        [ 0.07978792,  1.56376757,  0.62663807]]) array([[ 0.83688322, -1.56459039, -0.3353066 ,  0.22196922],        [ 0.79565477,  0.84710518, -1.21454498, -0.15961993],        [-2.4373344 , -0.35548589, -0.16684946, -0.04114339],        [ 0.72500848, -0.49079647,  1.09006296, -0.2438673 ],        [ 0.07978792,  1.56376757,  0.62663807,  0.2226614 ]]) array([[  8.36883220e-01,  -1.56459039e+00,  -3.35306597e-01,           2.21969223e-01,  -1.66533454e-16],        [  7.95654771e-01,   8.47105182e-01,  -1.21454498e+00,          -1.59619933e-01,   3.33066907e-16],        [ -2.43733440e+00,  -3.55485895e-01,  -1.66849458e-01,          -4.11433949e-02,   0.00000000e+00],        [  7.25008484e-01,  -4.90796471e-01,   1.09006296e+00,          -2.43867297e-01,  -1.38777878e-16],        [  7.97879229e-02,   1.56376757e+00,   6.26638070e-01,           2.22661402e-01,   2.22044605e-16]])

回答1:

This is the relevant, dimension-reducing line of LDA.transform, it uses scalings_. As described in the docstring, scalings_ has maximally n_classes - 1 columns. This is then the maximal number of columns you can hope to obtain using transform. In your case, 2 classes (True, False), yields maximally 1 column.

文章来源: LDA ignoring n_components?

标签

lda