I try to use Linear Discriminant Analysis from scikit-learn library, in order to perform dimensionality reduction on my data which has more than 200 features. But I could no
There is no inverse transform because in general, you can not return from the lower dimensional feature space to your original coordinate space.
Think of it like looking at your 2-dimensional shadow projected on a wall. You can't get back to your 3-dimensional geometry from a single shadow because information is lost during the projection.
To address your comment regarding PCA, consider a data set of 10 random 3-dimensional vectors:
In [1]: import numpy as np
In [2]: from sklearn.decomposition import PCA
In [3]: X = np.random.rand(30).reshape(10, 3)
Now, what happens if we apply the Principal Components Transformation (PCT) and apply dimensionality reduction by keeping only the top 2 (out of 3) PCs, then apply the inverse transform?
In [4]: pca = PCA(n_components=2)
In [5]: pca.fit(X)
Out[5]:
PCA(copy=True, iterated_power='auto', n_components=2, random_state=None,
svd_solver='auto', tol=0.0, whiten=False)
In [6]: Y = pca.transform(X)
In [7]: X.shape
Out[7]: (10, 3)
In [8]: Y.shape
Out[8]: (10, 2)
In [9]: XX = pca.inverse_transform(Y)
In [10]: X[0]
Out[10]: array([ 0.95780971, 0.23739785, 0.06678655])
In [11]: XX[0]
Out[11]: array([ 0.87931369, 0.34958407, -0.01145125])
Obviously, the inverse transform did not reconstruct the original data. The reason is that by dropping the lowest PC, we lost information. Next, let's see what happens if we retain all PCs (i.e., we do not apply any dimensionality reduction):
In [12]: pca2 = PCA(n_components=3)
In [13]: pca2.fit(X)
Out[13]:
PCA(copy=True, iterated_power='auto', n_components=3, random_state=None,
svd_solver='auto', tol=0.0, whiten=False)
In [14]: Y = pca2.transform(X)
In [15]: XX = pca2.inverse_transform(Y)
In [16]: X[0]
Out[16]: array([ 0.95780971, 0.23739785, 0.06678655])
In [17]: XX[0]
Out[17]: array([ 0.95780971, 0.23739785, 0.06678655])
In this case, we were able to reconstruct the original data because we didn't throw away any information (since we retained all the PCs).
The situation with LDA is even worse because the maximum number of components that can be retained is not 200 (the number of features for your input data); rather, the maximum number of components you can retain is n_classes - 1
. So if, for example, you were doing a binary classification problem (2 classes), the LDA transform would be going from 200 input dimensions down to just a single dimension.