Python (scikit learn) lda collapsing to single dimension

痞子三分冷 提交于 2019-12-06 10:37:36

问题


I'm very new to scikit learn and machine learning in general.

I am currently designing a SVM to predict if a specific amino acid sequence will be cut by a protease. So far the the SVM method seems to be working quite well:

I'd like to visualize the distance between the two categories (cut and uncut), so I'm trying to use the linear discrimination analysis, which is similar to the principal component analysis, using the following code:

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
lda = LinearDiscriminantAnalysis(n_components=2)
targs = np.array([1 if _ else 0 for _ in XOR_list])
DATA = np.array(data_list)
X_r2 = lda.fit(DATA, targs).transform(DATA)
plt.figure()
for c, i, target_name in zip("rg", [1, 0],["Cleaved","Not Cleaved"]):
    plt.scatter(X_r2[targs == i], X_r2[targs == i], c=c, label=target_name)
plt.legend()
plt.title('LDA of cleavage_site dataset')

However, the LDA is only giving a 1D result

In: print X_r2[:5]
Out: [[ 6.74369996]
 [ 4.14254941]
 [ 5.19537896]
 [ 7.00884032]
 [ 3.54707676]]

However, the pca analysis will give 2 dimensions with the data I am inputting:

pca = PCA(n_components=2)
X_r = pca.fit(DATA).transform(DATA)
print X_r[:5]
Out: [[ 0.05474151  0.38401203]
 [ 0.39244191  0.74113729]
 [-0.56785236 -0.30109694]
 [-0.55633116 -0.30267444]
 [ 0.41311866 -0.25501662]]

edit: here is a link to two google-docs with the input data. I am not using the sequence information, just the numerical information that follows. The files are split up between positive and negative control data. Input data: file1 file2


回答1:


LDA is not a dimensionality reduction technique. LDA is a classifier, the fact that people visualize decision function is just a side effect, and - unfortunately for your use case - decision function for binary problem (2 classes) is 1 dimensional. There is nothing wrong with your code, this is how every single decision function of a linear binary classifier looks like.

In general for 2 classes you get at most 1-dim projection and for K>2 classes you can get up to K-dim projection. With other decomposition techniques (like 1 vs 1) you can go up to K(K-1)/2 but again, only for more than 2 classes.



来源:https://stackoverflow.com/questions/39083308/python-scikit-learn-lda-collapsing-to-single-dimension

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!