pca | 易学教程

subset of prcomp object in R

阅读更多关于 subset of prcomp object in R

问题 I'm basically computing the PCA for a set of variables and everything works fine. Lets say I'm using the iris data as an example, but my data is different. The iris data should be sufficient to explain my question: data(iris) log.ir <- log(iris[, 1:4]) log.ir[mapply(is.infinite, log.ir)] <- 0 ir.groups<- iris[, 5] ir.pca <- prcomp(log.ir, center = TRUE, scale. = TRUE) library(ggbiplot) g <- ggbiplot(ir.pca, obs.scale = 1,var.scale = 1,groups = ir.groups, var.axes=F) g <- g + scale_color

PCA analysis using Correlation Matrix as input in R

阅读更多关于 PCA analysis using Correlation Matrix as input in R

问题 Now i have a 7000*7000 correlation matrix and I have to do PCA on this in R. I used the CorPCA <- princomp(covmat=xCor) , xCor is the correlation matrix but it comes out "covariance matrix is not non-negative definite" it is because i have some negative correlation in that matrix. I am wondering which inbuilt function in R that i can use to get the result of PCA 回答1: not non-negative definite does not mean the covariance matrix has negative correlations. It's a linear algebra equivalent of

scikit-learn TruncatedSVD's explained variance ratio not in descending order

阅读更多关于 scikit-learn TruncatedSVD's explained variance ratio not in descending order

问题 The TruncatedSVD's explained variance ratio is not in descending order, unlike sklearn's PCA. I looked at the source code and it seems they use different way of calculating the explained variance ratio: TruncatedSVD: U, Sigma, VT = randomized_svd(X, self.n_components, n_iter=self.n_iter, random_state=random_state) X_transformed = np.dot(U, np.diag(Sigma)) self.explained_variance_ = exp_var = np.var(X_transformed, axis=0) if sp.issparse(X): _, full_var = mean_variance_axis(X, axis=0) full_var

Set axis limits for plot.PCA - strange behaviour (FactoMineR)

阅读更多关于 Set axis limits for plot.PCA - strange behaviour (FactoMineR)

问题 I want to plot the result of a PCA with the package factominer. When one plots a PCA it is good practice to resize the graph so that its H/L ratio is in proportion to the %variance explained by each axis. So I tried first to resize the graph simply by stretching the window: it obviously fails and keep the points in the middle instead of stretching them too. library(FactoMineR) out <- PCA(mtcars) Then I try to force the xlim and ylim arguments to be fixed ( ?plot.PCA ) plot.PCA(out, choix="ind

Is it okay to normalize by row after running a PCA?

阅读更多关于 Is it okay to normalize by row after running a PCA?

问题 I have a dataset of 50K rows and 26 features. I'm normalizing the columns using sklearn's StandardScaler (each column has 0 mean and 1 standard deviation), then running a PCA to reduce the featureset to ~90% of the original variance. I'm then normalizing the rows, before I run sklearn's KMeans algorithm. Is there any reason I shouldn't be normalizing the rows after running a PCA? If there is, would normalizing the rows before the PCA cause any issues - should this be done before or after

how can I retrieve / impute the underlying rotation matrix (rotmat) from psych::principal?

阅读更多关于 how can I retrieve / impute the underlying rotation matrix (rotmat) from psych::principal?

问题 I'm using psych::principal in another function, with various rotate functions passed to principal . ( principal offers many rotation options and passes them on to different other functions). I need to get the rotation matrix that whichever rotation procedure was used found, and implemented. All of the downstream rotation procedures offer this, but it appears not to be return() ed by principal . For example: randomcor <- cor(matrix(data = rnorm(n = 100), nrow = 10)) library(psych) principalres

因子分析和PCA总结

阅读更多关于因子分析和PCA总结

因子分析和 PCA 定义因子分析就是数据降维工具。从一组相关变量中删除冗余或重复，把相关的变量放在一个因子中，实在不相关的因子有可能被删掉。用一组较小的 “ 派生 ” 变量表示相关变量，这个派生就是新的因子。形成彼此相对独立的因素，就是说新的因子彼此之间正交。应用筛选变量。步骤 3.1 计算所有变量的相关矩阵 3.2 要素提取，仅在此处需要使用 PCA 3.3 要素轮换 3.4 就基本因素的数量作出最后决定 3.1 计算所有变量的相关矩阵构建数据矩阵，该数据矩阵是相关矩阵（矩阵里面全是相关系数）， PCA 之后变为因子矩阵。绝对值大于 0.3 的相关系数表示可接受的相关性，即相关系数大于 0.3 则把它们放在一堆。 3.2 要素提取，仅在此处需要使用 PCA （当然也有其他方法，要素提取使用不同方法有不同结果）按照对方差的解释程度排序。连续分量解释总样本方差的逐渐变小的部分，并且所有的分量彼此不相关。确定因子数：特征值大于 1 3.3 要素轮换因素轴转为了让因子之间差距尽量大。非旋转因素通常不是很容易解释的 ( 比如因素 1与所有变量都相关，因素二与前四个变量相关) 对因素进行旋转，使它们更有意义，更易于解释 (每个变量都与最小数量的因素相关联)。不同旋转方法会识别不同因素，这与要素提取使用不同方法有不同结果是一样的。 3.4

Diffrent PCA plots

阅读更多关于 Diffrent PCA plots

问题 I was trying to to learn pca(using the iris dataset) with python and i got some results,so i wanted to test the results ir R to make sure it was good.When i checked the results,it gave me a mirror diagram that of python(in the y axis),and the negative numeric sign in some of the values(python: [140,1]=0.1826089,r[141,2]=-0.1826089[python counts form zero]). The python code: import numpy as np import matplotlib.pyplot as plt import sklearn.decomposition as p data=np.loadtxt("sample_data/iris

dimension reduction using pca for FastICA

阅读更多关于 dimension reduction using pca for FastICA

问题 I am trying to develop a system for image classification. I am using following the article: INDEPENDENT COMPONENT ANALYSIS (ICA) FOR TEXTURE CLASSIFICATION by Dr. Dia Abu Al Nadi and Ayman M. Mansour In a paragraph it says: Given the above texture images, the Independent Components are learned by the method outlined above. The (8 x 8) ICA basis function for the above textures are shown in Figure 2. respectively. The dimension is reduced by PCA, resulting in a total of 40 functions. Note that

PCA: why do I get so different results from princomp() and prcomp()?

阅读更多关于 PCA: why do I get so different results from princomp() and prcomp()?

问题 In the code below what is the difference between pc3$loadings and pc4$rotation ? Code: pc3<-princomp(datadf, cor=TRUE) pc3$loadings pc4<-prcomp(datadf,cor=TRUE) pc4$rotation Data: datadf<-dput(datadf) structure(list(gVar4 = c(11, 14, 17, 5, 5, 5.5, 8, 5.5, 6.5, 8.5, 4, 5, 9, 10, 11, 7, 6, 7, 7, 5, 6, 9, 9, 6.5, 9, 3.5, 2, 15, 2.5, 17, 5, 5.5, 7, 6, 3.5, 6, 9.5, 5, 7, 4, 5, 4, 9.5, 3.5, 5, 4, 4, 9, 4.5), gVar1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,