pca

Python PCA plot using Hotelling's T2 for a confidence interval

匿名 (未验证) 提交于 2019-12-03 01:34:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I am trying to apply PCA for Multi variant Analysis and plot the score plot for first two components with Hotelling T2 confidence ellipse in python. I was able to get the scatter plot and I want to add 95% confidence ellipse to the scatter plot. It would be great if anyone know how it can be done in python. Sample picture of expected output: 回答1: This was bugging me, so I adopted an answer from PCA and Hotelling's T^2 for confidence intervall in R in python (and using some source code from the ggbiplot R package) from sklearn import

主成分分析PCA

≯℡__Kan透↙ 提交于 2019-12-03 01:26:12
原文链接 PCA简介 如图所示,这是一个二维点云,我们想找出方差最大的方向,如右图所示,这个最大方向的计算,就是PCA做的事情。在高维情况下,PCA不光可以计算出最大方差方向,还可以计算第二大,第三大方向等。 PCA(Principal Components Analysis),中文名也叫主成分分析。它可以按照方差大小,计算出相互正交的方向,这些方向也叫主方向。它常用于对高维数据进行降维,也就是把高维数据投影到方差大的几个主方向上,方便数据分析。 PCA的计算很简单: 第一步计算数据的协方差矩阵:Cov = ∑ (Di – C) X (Di – C),其中Di是第i个数据,C是数据的平均值 然后计算协方差矩阵的特征值和特征向量,特征向量就是主方向,按照特征值的大小,从大到小依次排列 下面介绍PCA的一些应用。 三维人体模型参数化 如图是一些拟合好的三维人体模型。它是通过扫描了几千个人体,然后用人体模板网格去拟合这些扫描数据得到的。这些拟合后的人体网格,有相同的网格拓扑结构。 假设人体网格有N个顶点,则一个人体的几何可以由3N个浮点数来表示,记这个向量为Si。 如果有K个人体数据,记{Si}的平均向量为ES,Ui = Si - ES,那么{Ui}刻画了这K个人体几何的变化量。 这个一个高维向量,我们可以用PCA对{Ui}进行降维,比如降到k维。设PCA的主方向为D1, D2, ...

OPENCV: PCA application error in image_proc

匿名 (未验证) 提交于 2019-12-03 01:12:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: Base from this here . I got this error and this is the only one left for almost 3 days of my trial and error in debugging: Unhandled exception at 0x000007FEEC6315A4 (opencv_imgproc242.dll) in PCA.exe: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF. Please can someone here who can help me with this. Im currently using VS2012 and my os is win7 64-bit. I configure my opencv 2.4.2 following this blog . Please help! 回答1: I've corrected some minor bugs (and now it works perfect for me): #include #include using namespace cv; using

Is there good library to do nonnegative matrix factorization (NMF) fast?

会有一股神秘感。 提交于 2019-12-03 00:43:58
I have a sparse matrix whose shape is 570000*3000 . I tried nima to do NMF (using the default nmf method, and set max_iter to 65). However, I found nimfa very slow. Have anyone used a faster library to do NMF? tskuzzy I have used libNMF before. It's written in C and is very fast. There is a paper documenting the algorithm and code. The paper also lists several alternative packages for NMF (in bunch of different languages (which I have copied here for future reference). The Mathworks [3, 33] Matlab http://www.mathworks.com/access/helpdesk/help/toolbox/stats/nnmf . Cemgil [5] Matlab http://www

从主成分分析(PCA)到奇异值分解(SVD)

匿名 (未验证) 提交于 2019-12-03 00:41:02
主成分分析(principal factor analysis),简称PCA,是机器学习中非常常见的压缩降维方法。为什么需要压缩降维?是由于高维的样本本身存在冗余、稀疏的特点,直接把高维样本用于拟合或者模式识别,极其容易出现过拟合。而在处理实际问题时,与学习任务相关的也许仅是高维样本的某个低维分布,因而需要降维。(举个例子,如……) PCA的降维思想是,在高维的样本空间中,寻找一个低维的超平面,把所有高维样本投影于此超平面上,得到低维样本,并且使投影误差最小,或者使投影得到的样本最大可分。 紧接着上述提到的两种性质,在描述PCA的降维思想时,有以下两种定义方式: 最小误差形式 最大方差形式 可以从数学推导上证明,两种定义方式最终导出的结果等价,可以得到一样的算法。(两种方法的数学推导过程有时间再补充……) (算法流程待补充……) 总结来说,主成分分析涉及到计算数据集的均值 x x 和协方差矩阵 S S ,然后寻找协方差矩阵的对应于 M M 个最大特征值的 M M 个特征向量,从而得到投影矩阵。 PCA与SVD的关系主要体现在求解特征向量的过程。在一般介绍PCA算法原理的资料中,均是要先求得样本的协方差矩阵,然后从协方差矩阵中求解得特征值和特征向量。然而,对于归一化的样本,协方差矩阵 S = X X T S = X X T (待补充数学证明),而某些SVD的实现方法可以从样本矩阵 X

doing PCA on very large data set in R

瘦欲@ 提交于 2019-12-03 00:36:13
This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 7 years ago . Learn more . I have a very large training set (~2Gb) in a CSV file. The file is too large to read directly into memory ( read.csv() brings the computer to a halt) and I would like to reduce the size of the data file using PCA. The problem is that (as far as I can tell) I need to read the file into memory in order to run a PCA algorithm (e.g., princomp() ). I have tried the bigmemory package to read the file in as a big.matrix , but princomp doesn't function on big.matrix objects

principal component analysis (PCA) in R: which function to use?

淺唱寂寞╮ 提交于 2019-12-03 00:33:36
Can anyone explain what the major differences between the prcomp and princomp functions are? Is there any particular reason why I should choose one over the other? In case this is relevant, the type of application I am looking at is a quality control analysis for genomic (expression) data sets. Thank you! doug There are differences between these two functions w/r/t the function parameters (what you can/must pass in when you call the function); the values returned by each; and the numerical technique used by each to calculate principal components. Numerical Technique Used to Calculate PCA In

sklearn中的降维PCA与TSNE

匿名 (未验证) 提交于 2019-12-03 00:32:02
同为降维工具,二者的主要区别在于, from sklearn.decomposition import PCA from sklearn.manifold import TSNE 因为原理不同,导致,tsne 保留下的属性信息,更具代表性,也即最能体现样本间的差异; TSNE 运行极慢,PCA 则相对较快; 因此更为一般的处理,尤其在展示(可视化)高维数据时,常常先用 PCA 进行降维,再使用 tsne: dat a_pca = PCA(n_components= 50 ).fit_transform(data) dat a_pca _tsne = TSNE(n_components= 2 ).fit_transform(dat a_pca ) 转载https://blog.csdn.net/lanchunhui/article/details/64923702 文章来源: sklearn中的降维PCA与TSNE

独立成分分析 ( ICA ) 与主成分分析 ( PCA ) 的区别

匿名 (未验证) 提交于 2019-12-03 00:30:01
1.前言 书上写的是: 1. 主成分分析假设源信号间彼此非相关,独立成分分析假设源信号间彼此独立。 2. 主成分分析认为主元之间彼此正交,样本呈高斯分布; 独立成分分析则不要求样本呈高斯分布。 在利用最大化信息熵的方法进行独立成分分析的时候,需要为源信号假定一个概率密度分布函数g',进而找出使得g(Y)=g(Wx)的信息熵最大的变换W,即有Y=s。 我的问题是, 1. 这个概率密度分布函数怎么假定?在实际信号处理中怎么给出? 2. 如果我观测到信号呈高斯分布,取g'为高斯分布,那么ICA和PCA得到的结果会相同吗? 2.解析 不管是PCA还是ICA,都不需要对源信号的分布做具体的假设;如果观察到的信号为高斯,那么源信号也为高斯,此时PCA和ICA等价。下面稍作展开。 假设观察到的信号是n维随机变量 主成分分析(PCA)和独立成分分析(ICA)的目的都是找到一个方向,即一个n维向量 使得线性组合 的某种特征最大化。 2.1主成分分析 PCA PCA认为一个随机信号最有用的信息体包含在方差里 。为此我们需要找到一个方向 w1 ,使得随机信号x在该方向上的投影 w1(T)X 的方差最大化。接下来,我们在与 w1 正交的空间里到方向 w2 ,使得 w2(T)X 的方差最大,以此类推直到找到所有的n个方向 wn . 用这种方法我们最终可以得到一列不相关的随机变量 . 如果用矩阵的形式,记 W

How is the complexity of PCA O(min(p^3,n^3))?

六月ゝ 毕业季﹏ 提交于 2019-12-03 00:22:38
I've been reading a paper on Sparse PCA, which is: http://stats.stanford.edu/~imj/WEBLIST/AsYetUnpub/sparse.pdf And it states that, if you have n data points, each represented with p features, then, the complexity of PCA is O(min(p^3,n^3)) . Can someone please explain how/why? Covariance matrix computation is O(p 2 n); its eigen-value decomposition is O(p 3 ). So, the complexity of PCA is O(p 2 n+p 3 ). O(min(p 3 ,n 3 )) would imply that you could analyze a two-dimensional dataset of any size in fixed time, which is patently false. Assuming your dataset is $X \in \R^{nxp}$ where n: number of