pca

(R) Visualizing a data set with large number of variables using PCA (ggbiplot)

坚强是说给别人听的谎言 提交于 2021-01-29 04:58:45
问题 My dataset has 100 samples and 17000 variables. I would use PCA and visualize data. But the problem is that the plot is not good. How I can control the number of arrows in ggbiplot or biplot , in fact select the most contributed variables? Some sample codes are as below: data <- matrix(rnorm(1700000), nrow=100, ncol=17000) colnames(data) <- paste("X", 1:ncol(data), sep="") pca <- prcomp(data, scale=T, center=T) biplot(pca) print(ggbiplot(pca, obs.scale = 1, var.scale = 1, groups = c(rep('a'

Sklearn's PCA gives 'wrong' output for last row

送分小仙女□ 提交于 2021-01-28 11:05:57
问题 I am trying to run data through sklearn's PCA (n_components=2) and find that the y-value of the last row is different to the other values of the same input values. Notably, the input data only consist of two distinct entries and when changing the number of occurrences for an entry the error disappears. Please find the code below to replicate the error. import pandas as pd from sklearn.decomposition import PCA lst1 = [[-0.485886999,0,-0.485886999,-0.485886999,-0.485886999,0,-0.485886999,-0

Reverse generation of RGB PCA image not working

南笙酒味 提交于 2021-01-28 08:05:44
问题 Shakira.jpg I am trying to compress the above image but the output that I am getting is an improper image. I think I am doing the PCA steps correctly, but something is going wrong at the final step. Shakira compressed import pylab as plt import numpy as np img = plt.imread("shakira.jpg") print(img.shape) plt.axis('off') plt.imshow(img) plt.show() img_reshaped = np.reshape(img, (930, 1860)) print(img_reshaped.shape) from sklearn.decomposition import PCA pca = PCA(.95) pca.fit(img_reshaped) img

Labeling points in a biplot

ε祈祈猫儿з 提交于 2021-01-27 04:54:34
问题 I have performed a PCA and drawn a biplot in R . pca1= princomp (~ data$X250 + data$X500 + data$shear, scores=TRUE, cor=TRUE, rownames=data[,1]) biplot(pca1, xlab="PC 1", ylab="PC 2", pch=20) Currently the labels on the biplot are the row numbers, but I would like the point labels to be the plot names of my data. My data has 81 rows. I have tried: text (pca1[1:81], pca1[1:81], labels = row.names(data)) text (1:81, 1:81, labels = row.names(data)) text (pca1$comp.1[1:81], pca1$comp.2[1:81],

Labeling points in a biplot

帅比萌擦擦* 提交于 2021-01-27 04:54:13
问题 I have performed a PCA and drawn a biplot in R . pca1= princomp (~ data$X250 + data$X500 + data$shear, scores=TRUE, cor=TRUE, rownames=data[,1]) biplot(pca1, xlab="PC 1", ylab="PC 2", pch=20) Currently the labels on the biplot are the row numbers, but I would like the point labels to be the plot names of my data. My data has 81 rows. I have tried: text (pca1[1:81], pca1[1:81], labels = row.names(data)) text (1:81, 1:81, labels = row.names(data)) text (pca1$comp.1[1:81], pca1$comp.2[1:81],

raise LinAlgError(“SVD did not converge”) LinAlgError: SVD did not converge in matplotlib pca determination

无人久伴 提交于 2021-01-20 17:07:08
问题 code : import numpy from matplotlib.mlab import PCA file_name = "store1_pca_matrix.txt" ori_data = numpy.loadtxt(file_name,dtype='float', comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0) result = PCA(ori_data) this is my code. though my input matrix is devoid of the nan and inf, i do get the error stated below. raise LinAlgError("SVD did not converge") LinAlgError: SVD did not converge what's the problem? 回答1: This can happen when there are inf

raise LinAlgError(“SVD did not converge”) LinAlgError: SVD did not converge in matplotlib pca determination

给你一囗甜甜゛ 提交于 2021-01-20 17:07:03
问题 code : import numpy from matplotlib.mlab import PCA file_name = "store1_pca_matrix.txt" ori_data = numpy.loadtxt(file_name,dtype='float', comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0) result = PCA(ori_data) this is my code. though my input matrix is devoid of the nan and inf, i do get the error stated below. raise LinAlgError("SVD did not converge") LinAlgError: SVD did not converge what's the problem? 回答1: This can happen when there are inf

PCA on transposed data

大憨熊 提交于 2020-12-13 10:36:23
问题 I am using R to do some PCA analysis. Everything was working fine until it occurred to me that I should be dealing with the transpose of my data set. However when I tried to do PCA on the transposed data set I could not get it to work out! > sum(is.na(data_t)) [1] 1367 > dim(data_t) [1] 599 9505 > data_t[1:4,1:4] 2'-PDE 7A5 A1BG A2M TCGA.A1.A0SD.01A.11R.A115.07 0.0153750 2.4105 0.9493333 0.24200 TCGA.A1.A0SE.01A.11R.A084.07 0.4669375 0.3635 0.2798333 1.03850 TCGA.A1.A0SH.01A.11R.A084.07 -0

PCA on transposed data

半腔热情 提交于 2020-12-13 10:36:16
问题 I am using R to do some PCA analysis. Everything was working fine until it occurred to me that I should be dealing with the transpose of my data set. However when I tried to do PCA on the transposed data set I could not get it to work out! > sum(is.na(data_t)) [1] 1367 > dim(data_t) [1] 599 9505 > data_t[1:4,1:4] 2'-PDE 7A5 A1BG A2M TCGA.A1.A0SD.01A.11R.A115.07 0.0153750 2.4105 0.9493333 0.24200 TCGA.A1.A0SE.01A.11R.A084.07 0.4669375 0.3635 0.2798333 1.03850 TCGA.A1.A0SH.01A.11R.A084.07 -0

LDA contribution biplot

孤者浪人 提交于 2020-12-06 08:37:23
问题 I am trying to create a biplot for a linear discriminate analysis (LDA). I am using a modified version of code obtained from here https://stats.stackexchange.com/questions/82497/can-the-scaling-values-in-a-linear-discriminant-analysis-lda-be-used-to-plot-e However, I have 80 variables, making the biplot extremely difficult to read. This is worsened by highly contributing variables, since their arrow lengths are very long and the remaining labels are scrunched up in the middle. So what I am