pca | 易学教程

Python PCA plot using Hotelling's T2 for a confidence interval

阅读更多关于 Python PCA plot using Hotelling's T2 for a confidence interval

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: I am trying to apply PCA for Multi variant Analysis and plot the score plot for first two components with Hotelling T2 confidence ellipse in python. I was able to get the scatter plot and I want to add 95% confidence ellipse to the scatter plot. It would be great if anyone know how it can be done in python. Sample picture of expected output: 回答1: This was bugging me, so I adopted an answer from PCA and Hotelling's T^2 for confidence intervall in R in python (and using some source code from the ggbiplot R package) from sklearn import

主成分分析PCA

阅读更多关于主成分分析PCA

原文链接 PCA简介如图所示，这是一个二维点云，我们想找出方差最大的方向，如右图所示，这个最大方向的计算，就是PCA做的事情。在高维情况下，PCA不光可以计算出最大方差方向，还可以计算第二大，第三大方向等。 PCA(Principal Components Analysis)，中文名也叫主成分分析。它可以按照方差大小，计算出相互正交的方向，这些方向也叫主方向。它常用于对高维数据进行降维，也就是把高维数据投影到方差大的几个主方向上，方便数据分析。 PCA的计算很简单：第一步计算数据的协方差矩阵：Cov = ∑ (Di – C) X (Di – C)，其中Di是第i个数据，C是数据的平均值然后计算协方差矩阵的特征值和特征向量，特征向量就是主方向，按照特征值的大小，从大到小依次排列下面介绍PCA的一些应用。三维人体模型参数化如图是一些拟合好的三维人体模型。它是通过扫描了几千个人体，然后用人体模板网格去拟合这些扫描数据得到的。这些拟合后的人体网格，有相同的网格拓扑结构。假设人体网格有N个顶点，则一个人体的几何可以由3N个浮点数来表示，记这个向量为Si。如果有K个人体数据，记{Si}的平均向量为ES，Ui = Si - ES，那么{Ui}刻画了这K个人体几何的变化量。这个一个高维向量，我们可以用PCA对{Ui}进行降维，比如降到k维。设PCA的主方向为D1, D2, ...

OPENCV: PCA application error in image_proc

阅读更多关于 OPENCV: PCA application error in image_proc

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: Base from this here . I got this error and this is the only one left for almost 3 days of my trial and error in debugging: Unhandled exception at 0x000007FEEC6315A4 (opencv_imgproc242.dll) in PCA.exe: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF. Please can someone here who can help me with this. Im currently using VS2012 and my os is win7 64-bit. I configure my opencv 2.4.2 following this blog . Please help! 回答1: I've corrected some minor bugs (and now it works perfect for me): #include #include using namespace cv; using

Is there good library to do nonnegative matrix factorization (NMF) fast?

阅读更多关于 Is there good library to do nonnegative matrix factorization (NMF) fast?

I have a sparse matrix whose shape is 570000*3000 . I tried nima to do NMF (using the default nmf method, and set max_iter to 65). However, I found nimfa very slow. Have anyone used a faster library to do NMF? tskuzzy I have used libNMF before. It's written in C and is very fast. There is a paper documenting the algorithm and code. The paper also lists several alternative packages for NMF (in bunch of different languages (which I have copied here for future reference). The Mathworks [3, 33] Matlab http://www.mathworks.com/access/helpdesk/help/toolbox/stats/nnmf . Cemgil [5] Matlab http://www

从主成分分析（PCA）到奇异值分解（SVD）

阅读更多关于从主成分分析（PCA）到奇异值分解（SVD）

主成分分析（principal factor analysis），简称PCA，是机器学习中非常常见的压缩降维方法。为什么需要压缩降维？是由于高维的样本本身存在冗余、稀疏的特点，直接把高维样本用于拟合或者模式识别，极其容易出现过拟合。而在处理实际问题时，与学习任务相关的也许仅是高维样本的某个低维分布，因而需要降维。（举个例子，如……） PCA的降维思想是，在高维的样本空间中，寻找一个低维的超平面，把所有高维样本投影于此超平面上，得到低维样本，并且使投影误差最小，或者使投影得到的样本最大可分。紧接着上述提到的两种性质，在描述PCA的降维思想时，有以下两种定义方式：最小误差形式最大方差形式可以从数学推导上证明，两种定义方式最终导出的结果等价，可以得到一样的算法。（两种方法的数学推导过程有时间再补充……）（算法流程待补充……）总结来说，主成分分析涉及到计算数据集的均值 x x 和协方差矩阵 S S ，然后寻找协方差矩阵的对应于 M M 个最大特征值的 M M 个特征向量，从而得到投影矩阵。 PCA与SVD的关系主要体现在求解特征向量的过程。在一般介绍PCA算法原理的资料中，均是要先求得样本的协方差矩阵，然后从协方差矩阵中求解得特征值和特征向量。然而，对于归一化的样本，协方差矩阵 S = X X T S = X X T （待补充数学证明），而某些SVD的实现方法可以从样本矩阵 X

doing PCA on very large data set in R

阅读更多关于 doing PCA on very large data set in R

This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 7 years ago . Learn more . I have a very large training set (~2Gb) in a CSV file. The file is too large to read directly into memory ( read.csv() brings the computer to a halt) and I would like to reduce the size of the data file using PCA. The problem is that (as far as I can tell) I need to read the file into memory in order to run a PCA algorithm (e.g., princomp() ). I have tried the bigmemory package to read the file in as a big.matrix , but princomp doesn't function on big.matrix objects

principal component analysis (PCA) in R: which function to use?

阅读更多关于 principal component analysis (PCA) in R: which function to use?

Can anyone explain what the major differences between the prcomp and princomp functions are? Is there any particular reason why I should choose one over the other? In case this is relevant, the type of application I am looking at is a quality control analysis for genomic (expression) data sets. Thank you! doug There are differences between these two functions w/r/t the function parameters (what you can/must pass in when you call the function); the values returned by each; and the numerical technique used by each to calculate principal components. Numerical Technique Used to Calculate PCA In

sklearn中的降维PCA与TSNE

阅读更多关于 sklearn中的降维PCA与TSNE

同为降维工具，二者的主要区别在于， from sklearn.decomposition import PCA from sklearn.manifold import TSNE 因为原理不同，导致，tsne 保留下的属性信息，更具代表性，也即最能体现样本间的差异； TSNE 运行极慢，PCA 则相对较快；因此更为一般的处理，尤其在展示（可视化）高维数据时，常常先用 PCA 进行降维，再使用 tsne： dat a_pca = PCA(n_components= 50 ).fit_transform(data) dat a_pca _tsne = TSNE(n_components= 2 ).fit_transform(dat a_pca ) 转载https://blog.csdn.net/lanchunhui/article/details/64923702 文章来源: sklearn中的降维PCA与TSNE

独立成分分析 ( ICA ) 与主成分分析 ( PCA ) 的区别

阅读更多关于独立成分分析 ( ICA ) 与主成分分析 ( PCA ) 的区别

1.前言书上写的是： 1. 主成分分析假设源信号间彼此非相关，独立成分分析假设源信号间彼此独立。 2. 主成分分析认为主元之间彼此正交，样本呈高斯分布；独立成分分析则不要求样本呈高斯分布。在利用最大化信息熵的方法进行独立成分分析的时候，需要为源信号假定一个概率密度分布函数g'，进而找出使得g(Y)=g(Wx)的信息熵最大的变换W，即有Y=s。我的问题是， 1. 这个概率密度分布函数怎么假定？在实际信号处理中怎么给出？ 2. 如果我观测到信号呈高斯分布，取g'为高斯分布，那么ICA和PCA得到的结果会相同吗？ 2.解析不管是PCA还是ICA，都不需要对源信号的分布做具体的假设；如果观察到的信号为高斯，那么源信号也为高斯，此时PCA和ICA等价。下面稍作展开。假设观察到的信号是n维随机变量主成分分析（PCA）和独立成分分析（ICA）的目的都是找到一个方向，即一个n维向量使得线性组合的某种特征最大化。 2.1主成分分析 PCA PCA认为一个随机信号最有用的信息体包含在方差里。为此我们需要找到一个方向 w1 ，使得随机信号x在该方向上的投影 w1(T)X 的方差最大化。接下来，我们在与 w1 正交的空间里到方向 w2 ，使得 w2(T)X 的方差最大，以此类推直到找到所有的n个方向 wn . 用这种方法我们最终可以得到一列不相关的随机变量 . 如果用矩阵的形式，记 W

How is the complexity of PCA O(min(p^3,n^3))?

阅读更多关于 How is the complexity of PCA O(min(p^3,n^3))?

I've been reading a paper on Sparse PCA, which is: http://stats.stanford.edu/~imj/WEBLIST/AsYetUnpub/sparse.pdf And it states that, if you have n data points, each represented with p features, then, the complexity of PCA is O(min(p^3,n^3)) . Can someone please explain how/why? Covariance matrix computation is O(p 2 n); its eigen-value decomposition is O(p 3 ). So, the complexity of PCA is O(p 2 n+p 3 ). O(min(p 3 ,n 3 )) would imply that you could analyze a two-dimensional dataset of any size in fixed time, which is patently false. Assuming your dataset is $X \in \R^{nxp}$ where n: number of

订阅 pca