dimensionality-reduction

Subset variables by significant P value

帅比萌擦擦* 提交于 2021-01-28 01:32:28
问题 I'm trying to subset variables by significant P-values, and I attempted with the following code, but it only selects all variables instead of selecting by condition. Could anyone help me to correct the problem? myvars <- names(summary(backward_lm)$coefficients[,4] < 0.05) happiness_reduced <- happiness_nomis[myvars] Thanks! 回答1: An alternative solution to Martin's great answer (in the comments section) using the broom package. Unfortunately, you haven't posted an data, so I'm using the mtcars

t-SNE generates different results on different machines

馋奶兔 提交于 2021-01-05 11:56:27
问题 I have around 3000 datapoints in 100D that I project to 2D with t-SNE. Each datapoint belongs to one of three classes. However, when I run the script on two separate computers I keep getting inconsistent results. Some inconsistency is expected as I use a random seed, however one of the computers keeps getting better results (I use a macbook pro and a stationary machine on Ubuntu). I use the t-SNE implementation from Scikit-learn. The script and data is identical, I've manually copied the

Using TSNE to dimensionality reduction. Why 3 D graph is not working?

杀马特。学长 韩版系。学妹 提交于 2020-05-16 04:08:51
问题 I have used the Digits dataset from Sklearn and I have tried to reduce the dimension from 64 to 3 using TSNE( t-Distributed Stochastic Neighbor Embedding): import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns #%matplotib inline from sklearn.manifold import TSNE from sklearn.datasets import load_digits from mpl_toolkits.mplot3d import Axes3D digits = load_digits() digits_df = pd.DataFrame(digits.data,) digits_df["target"] = pd.Series(digits.target) tsne

Using TSNE to dimensionality reduction. Why 3 D graph is not working?

佐手、 提交于 2020-05-16 04:07:08
问题 I have used the Digits dataset from Sklearn and I have tried to reduce the dimension from 64 to 3 using TSNE( t-Distributed Stochastic Neighbor Embedding): import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns #%matplotib inline from sklearn.manifold import TSNE from sklearn.datasets import load_digits from mpl_toolkits.mplot3d import Axes3D digits = load_digits() digits_df = pd.DataFrame(digits.data,) digits_df["target"] = pd.Series(digits.target) tsne

sklearn tsne with sparse matrix

纵然是瞬间 提交于 2020-01-17 07:06:25
问题 I'm trying to display tsne on a very sparse matrix with precomputed distances values but I'm having trouble with it. It boils down to this: row = np.array([0, 2, 2, 0, 1, 2]) col = np.array([0, 0, 1, 2, 2, 2]) distances = np.array([.1, .2, .3, .4, .5, .6]) X = csc_matrix((distances, (row, col)), shape=(3, 3)) Y = TSNE(metric='precomputed').fit_transform(X) However, I get this error: TypeError: A sparse matrix was passed, but dense data is required for method="barnes_hut". Use X.toarray() to

sklearn tsne with sparse matrix

谁说我不能喝 提交于 2020-01-17 07:06:06
问题 I'm trying to display tsne on a very sparse matrix with precomputed distances values but I'm having trouble with it. It boils down to this: row = np.array([0, 2, 2, 0, 1, 2]) col = np.array([0, 0, 1, 2, 2, 2]) distances = np.array([.1, .2, .3, .4, .5, .6]) X = csc_matrix((distances, (row, col)), shape=(3, 3)) Y = TSNE(metric='precomputed').fit_transform(X) However, I get this error: TypeError: A sparse matrix was passed, but dense data is required for method="barnes_hut". Use X.toarray() to

LDA ignoring n_components?

雨燕双飞 提交于 2020-01-10 14:17:41
问题 When I am trying to work with LDA from Scikit-Learn, it keeps only giving me one component, even though I am asking for more: >>> from sklearn.lda import LDA >>> x = np.random.randn(5,5) >>> y = [True, False, True, False, True] >>> for i in range(1,6): ... lda = LDA(n_components=i) ... model = lda.fit(x,y) ... model.transform(x) Gives /Users/orthogonal/virtualenvs/osxml/lib/python2.7/site-packages/sklearn/lda.py:161: UserWarning: Variables are collinear warnings.warn("Variables are collinear"

Plot PCA loadings and loading in biplot in sklearn (like R's autoplot)

守給你的承諾、 提交于 2020-01-09 13:09:29
问题 I saw this tutorial in R w/ autoplot . They plotted the loadings and loading labels: autoplot(prcomp(df), data = iris, colour = 'Species', loadings = TRUE, loadings.colour = 'blue', loadings.label = TRUE, loadings.label.size = 3) https://cran.r-project.org/web/packages/ggfortify/vignettes/plot_pca.html I prefer Python 3 w/ matplotlib, scikit-learn, and pandas for my data analysis. However, I don't know how to add these on? How can you plot these vectors w/ matplotlib ? I've been reading

Plot PCA loadings and loading in biplot in sklearn (like R's autoplot)

强颜欢笑 提交于 2020-01-09 13:08:38
问题 I saw this tutorial in R w/ autoplot . They plotted the loadings and loading labels: autoplot(prcomp(df), data = iris, colour = 'Species', loadings = TRUE, loadings.colour = 'blue', loadings.label = TRUE, loadings.label.size = 3) https://cran.r-project.org/web/packages/ggfortify/vignettes/plot_pca.html I prefer Python 3 w/ matplotlib, scikit-learn, and pandas for my data analysis. However, I don't know how to add these on? How can you plot these vectors w/ matplotlib ? I've been reading