pca

Finding the dimension with highest variance using scikit-learn PCA

二次信任 提交于 2019-11-30 03:08:57
I need to use pca to identify the dimensions with the highest variance of a certain set of data. I'm using scikit-learn's pca to do it, but I can't identify from the output of the pca method what are the components of my data with the highest variance. Keep in mind that I don't want to eliminate those dimensions, only identify them. My data is organized as a matrix with 150 rows of data, each one with 4 dimensions. I'm doing as follow: pca = sklearn.decomposition.PCA() pca.fit(data_matrix) When I print pca.explained_variance_ratio_ , it outputs an array of variance ratios ordered from highest

Adding principal components as variables to a data frame

末鹿安然 提交于 2019-11-30 02:46:17
问题 I am working with a dataset of 10000 data points and 100 variables in R. Unfortunately the variables I have do not describe the data in a good way. I carried out a PCA analysis using prcomp() and the first 3 PCs seem to account for a most of the variability of the data. As far as I understand, a principal component is a combination of different variables; therefore it has a certain value corresponding to each data point and can be considered as a new variable. Would I be able to add these

Factor Loadings using sklearn

非 Y 不嫁゛ 提交于 2019-11-30 00:07:08
I want the correlations between individual variables and principal components in python. I am using PCA in sklearn. I don't understand how can I achieve the loading matrix after I have decomposed my data? My code is here. iris = load_iris() data, y = iris.data, iris.target pca = PCA(n_components=2) transformed_data = pca.fit(data).transform(data) eigenValues = pca.explained_variance_ratio_ http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html doesn't mention how this can be achieved. I think that @RickardSjogren is describing the eigenvectors, while @BigPanda is

Plotting PCA scores with color

我的未来我决定 提交于 2019-11-29 23:18:13
问题 I'm doing PCA and I would like to plot first principal component vs second in R: pca<-princomp(~.,data=data, na.action=na.omit plot(pca$scores[,1],pca$scores[,2]) or maybe several principal components: pairs(pca$scores[,1:4]) however the points are black. How do I appropriately add color to the graphs? How many colors do I need? One for each principal component I am plotting? Or one for each row in my data matrix? Thanks EDIT: my data looks like this: > data[1:4,1:4] patient1 patient2

Basic example for PCA with matplotlib

烈酒焚心 提交于 2019-11-29 22:40:34
I trying to do a simple principal component analysis with matplotlib.mlab.PCA but with the attributes of the class I can't get a clean solution to my problem. Here's an example: Get some dummy data in 2D and start PCA: from matplotlib.mlab import PCA import numpy as np N = 1000 xTrue = np.linspace(0,1000,N) yTrue = 3*xTrue xData = xTrue + np.random.normal(0, 100, N) yData = yTrue + np.random.normal(0, 100, N) xData = np.reshape(xData, (N, 1)) yData = np.reshape(yData, (N, 1)) data = np.hstack((xData, yData)) test2PCA = PCA(data) Now, I just want to get the principal components as vectors in my

Python scikit learn pca.explained_variance_ratio_ cutoff

感情迁移 提交于 2019-11-29 21:46:50
Guru, When choosing the number of principal components (k), we choose k to be the smallest value so that for example, 99% of variance, is retained. However, in the Python Scikit learn, I am not 100% sure pca.explained_variance_ratio_ = 0.99 is equal to "99% of variance is retained"? Could anyone enlighten? Thanks. The Python Scikit learn PCA manual is here http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn.decomposition.PCA Yes, you are nearly right. The pca.explained_variance_ratio_ parameter returns a vector of the variance explained by each dimension.

用了这么多年的PCA可视化竟然是错的!!!

点点圈 提交于 2019-11-29 18:38:51
本文启发于上周开的 单细胞转录组 课程,本次课程由资深单细胞算法研究者戴老师主讲,深入浅出,各部分分析原理从理论到应用层面解释透彻,最新流程,最新代码,绝对值得学习。课程尚未结束,我就迫不及待向一位未能安排出时间参加此课程的老友及时安利了视频课。 言归正传,介绍培训课程的一张幻灯片:很多PCA可视化结果都是不合适的。 PCA或PCoA是常用的降维工具,之前有几篇文章介绍PCA的原理和可视化。 一文看懂PCA主成分分析 PCA主成分分析实战和可视化 附R代码和测试数据 排序方法比较大全PCA、PCoA、NMDS、CCA PCoA距离算法大全 读懂PCA和PCoA 环境因子关联分析—CCA还是RDA 默认PCA/PCoA软件输出的图通常为正方形或立方体,比较常见的2维PCA可视化图的长宽比是1:1。虽然常见,但这是 错误的 。 下面这张图展示了一套模拟的两簇高斯分布数据的PCA结果展示,Figure a和b是错误的长宽比,结果看上去有4簇。Figure c和d是正确的长宽比,d中的颜色是正确的分组关系。 实际上,PCA图的长宽比应该与各个维度的特征值的比值一致。因为特征值反应各个主成分所解释的原始数据的变异度(方差),需要保证在不同的主成分轴上,解释的单位长度相同,所以长宽比也要有讲究。 如果用基于 ggplot2 的工具绘图( ggplot2高效实用指南 (可视化脚本、工具、套路

Matlab Principal Component Analysis (eigenvalues order)

耗尽温柔 提交于 2019-11-29 18:27:11
I want to use the "princomp" function of Matlab but this function gives the eigenvalues in a sorted array. This way I can't find out to which column corresponds which eigenvalue. For Matlab, m = [1,2,3;4,5,6;7,8,9]; [pc,score,latent] = princomp(m); is the same as m = [2,1,3;5,4,6;8,7,9]; [pc,score,latent] = princomp(m); That is, swapping the first two columns does not change anything. The result (eigenvalues) in latent will be: (27,0,0) The information (which eigenvalue corresponds to which original (input) column) is lost. Is there a way to tell matlab to not to sort the eigenvalues? With PCA

How to get the 1st Principal Component by PCA using Python?

元气小坏坏 提交于 2019-11-29 17:20:34
I have a set of 2D vectors presented in a n*2 matrix form. I wish to get the 1st principal component, i.e. the vector that indicates the direction with the largest variance. I have found a rather detailed documentation on this from Rice University. Based on this, I have imported the data and done the following: import numpy as np dataMatrix = np.array(aListOfLists) # Convert a list-of-lists into a numpy array. aListOfLists is the data points in a regular list-of-lists type matrix. myPCA = PCA(dataMatrix) # make a new PCA object from a numpy array object Then how may I get the 3D vector that is

Why is the accuracy coming as 0% ? MATLAB LIBSVM

杀马特。学长 韩版系。学妹 提交于 2019-11-29 12:49:09
I extracted PCA features using: function [mn,A1,A2,Eigenfaces] = pca(T,f1,nf1) m=mean(T,2), %T is the whole training set train=size(T,2); A=[]; for i=1:train temp=double(T(:,i))-m; A=[A temp]; end train=size(f1,2); %f1 - Face 1 images from training set 'T' A=[]; for i=1:train temp=double(f1(:,i))-m; A1=[A1 temp]; end train=size(nf1,2); %nf1 - Images other than face 1 from training set 'T' A=[]; for i=1:train temp=double(nf1(:,i))-m; A2=[A2 temp]; end L=A'*A; [V D]=eig(L); for i=1:size(V,2) if(D(i,i)>1) L_eig=[L_eig V(:,1)]; end end Eigenfaces=A*L_eig; end Then i projected only the face 1(class