pca

基于变分自编码器(VAE)利用重建概率的异常检测

﹥>﹥吖頭↗ 提交于 2019-11-27 06:17:13
本文为博主翻译自:Jinwon的Variational Autoencoder based Anomaly Detection using Reconstruction Probability,如侵立删 http://dm.snu.ac.kr/static/docs/TR/SNUDM-TR-2015-03.pdf 摘要 我们提出了一种利用变分自动编码器重构概率的异常检测方法。重建概率是一种考虑变量分布变异性的概率度量。重建概率具有一定的理论背景,使其比重建误差更具有原则性和客观性,而重建误差是自动编码器(AE)和基于主成分(PCA)的异常检测方法所采用的。实验结果表明,所提出的方法形成了基于自动编码器的方法和基于主成分的方法。利用变分自动编码器的生成特性,可以推导出数据重构,分析异常的根本原因。 1 简介 异常或异常值是与剩余数据显着不同的数据点。 霍金斯将异常定义为一种观察结果,它与其他观察结果有很大的偏差,从而引起人们怀疑它是由不同的机制产生的[5]。 分析和检测异常非常重要,因为它揭示了有关数据生成过程特征的有用信息。 异常检测应用于网络入侵检测,信用卡欺诈检测,传感器网络故障检测,医疗诊断等众多领域[3]。 在许多异常检测方法中,光谱异常检测技术试图找到原始数据的低维嵌入,其中异常和正常数据预期彼此分离。 在找到那些较低维度的嵌入之后,它们被带回原始数据空间

Principal Component Analysis (PCA) in Python

那年仲夏 提交于 2019-11-27 06:00:56
I have a (26424 x 144) array and I want to perform PCA over it using Python. However, there is no particular place on the web that explains about how to achieve this task (There are some sites which just do PCA according to their own - there is no generalized way of doing so that I can find). Anybody with any sort of help will do great. EnricoGiampieri You can find a PCA function in the matplotlib module: import numpy as np from matplotlib.mlab import PCA data = np.array(np.random.randint(10,size=(10,3))) results = PCA(data) results will store the various parameters of the PCA. It is from the

Principal component analysis in Python

浪子不回头ぞ 提交于 2019-11-27 05:49:53
I'd like to use principal component analysis (PCA) for dimensionality reduction. Does numpy or scipy already have it, or do I have to roll my own using numpy.linalg.eigh ? I don't just want to use singular value decomposition (SVD) because my input data are quite high-dimensional (~460 dimensions), so I think SVD will be slower than computing the eigenvectors of the covariance matrix. I was hoping to find a premade, debugged implementation that already makes the right decisions for when to use which method, and which maybe does other optimizations that I don't know about. ChristopheD You might

Plotting pca biplot with ggplot2

限于喜欢 提交于 2019-11-27 03:13:32
I wonder if it is possible to plot pca biplot results with ggplot2. Suppose if I want to display the following biplot results with ggplot2 fit <- princomp(USArrests, cor=TRUE) summary(fit) biplot(fit) Any help will be highly appreciated. Thanks Maybe this will help-- it's adapted from code I wrote some time back. It now draws arrows as well. PCbiplot <- function(PC, x="PC1", y="PC2") { # PC being a prcomp object data <- data.frame(obsnames=row.names(PC$x), PC$x) plot <- ggplot(data, aes_string(x=x, y=y)) + geom_text(alpha=.4, size=3, aes(label=obsnames)) plot <- plot + geom_hline(aes(0), size=

PCA Scaling with ggbiplot

妖精的绣舞 提交于 2019-11-27 01:41:38
问题 I'm trying to plot a principal component analysis using prcomp and ggbiplot . I'm getting data values outside of the unit circle, and haven't been able to rescale the data prior to calling prcomp in such a way that I can constrain the data to the unit circle. data(wine) require(ggbiplot) wine.pca=prcomp(wine[,1:3],scale.=TRUE) ggbiplot(wine.pca,obs.scale = 1, var.scale=1,groups=wine.class,ellipse=TRUE,circle=TRUE) I tried scaling by subtracting mean and dividing by standard deviation before

Cluster texture based on features extracted from Gabor

点点圈 提交于 2019-11-26 23:10:32
问题 I'm trying to cluster textures based on the features extracted from Gabor bank that I've created, but the result is far from what is typically expected so here is what I'm doing >> 1-generate a filter bank (based on the Miki's answer here I'm getting both real and imaginary part so that I can later extract Magnitude feature) void Gabor::generateFilterbank(int bankRows,int bankCols) { bankCol=bankCols; bankRow=bankRows; setBankSize(); int thetaStep=pos_th_max/bankCols; int lambadaStep=(pos_lm

How to solve prcomp.default(): cannot rescale a constant/zero column to unit variance

余生长醉 提交于 2019-11-26 22:58:57
问题 I have a data set of 9 samples (rows) with 51608 variables (columns) and I keep getting the error whenever I try to scale it: This works fine pca = prcomp(pca_data) However, pca = prcomp(pca_data, scale = T) gives > Error in prcomp.default(pca_data, center = T, scale = T) : cannot rescale a constant/zero column to unit variance Obviously it's a little hard to post a reproducible example. Any ideas what the deal could be? Looking for constant columns: sapply(1:ncol(pca_data), function(x){

PCA in matlab selecting top n components

青春壹個敷衍的年華 提交于 2019-11-26 22:55:26
I want to select the top N=10,000 principal components from a matrix. After the pca is completed, MATLAB should return a pxp matrix, but it doesn't! >> size(train_data) ans = 400 153600 >> [coefs,scores,variances] = pca(train_data); >> size(coefs) ans = 153600 399 >> size(scores) ans = 400 399 >> size(variances) ans = 399 1 It should be coefs:153600 x 153600 ? and scores:400 X 153600 ? When I use the below code it gives me an Out of Memory error:: >> [V D] = eig(cov(train_data)); Out of memory. Type HELP MEMORY for your options. Error in cov (line 96) xy = (xc' * xc) / (m-1); I don't

4.pca与梯度上升法

给你一囗甜甜゛ 提交于 2019-11-26 21:00:39
(一)什么是pca pca,也就是主成分分析法(principal component analysis),主要是用来对数据集进行降维处理。举个最简单的例子,我要根据姓名、年龄、头发的长度、身高、体重、皮肤的白皙程度(随便想的)等特征来预测一个人的性别,但这些特征中有一个是最没有用的,是什么的?显然是年龄,因为年龄的大小跟这个人的性别无关。还有姓名,这个特征显然起不到决定性作用,因为有的男孩的名字起的就像女孩(比如我本人),反之亦然,但是起码绝大多数情况还是能判断的。同理还有身高,一个180CM的很大概率是男孩,当然女孩也有180cm的,比如模特。像这样我从样本的特征中,挑选出最能代表样本、或者对样本预测起到决定性作用最大的n个特征,就叫做主成分分析。为什么会有pca呢?可以想象一个,显示生活中,样本的特征很多,成百上千个也是正常的,但是我们训练不可能用全部的特征进行训练,因为肯定有很多特征是没有用的,或者说起到的作用是很小的,我们的目的就是希望找到起到决定性最大的n个特征。 主成分分析的特征 一个非监督的机器学习算法 主要用于数据的降维 通过降维,可以发现更便于人类理解的特征 其他特征:可视化,去噪等等 我们举一个只有两个特征的例子 如果我们只考虑特征1,不考虑特征2的话,那么显然,蓝色的点要从二维映射到一维 那么同理,如果我们只考虑特征2,不考虑特征1的话,那么显然会是这样

主成分分析法

房东的猫 提交于 2019-11-26 19:41:53
目录 主成分分析法 一、主成分分析的理解 二、使用梯度上升法求解PCA 三、求数据的前n个主成分 四、将高维数据向低维数据映射 五、scikit-learn中的PCA 六、对真实数据集MNIST使用PCA 七、使用PCA降噪 八、PCA与人脸识别 我是尾巴: 主成分分析法 主成分分析法:(Principle Component Analysis, PCA),是一个非监督机器学习算法,主要用于数据降维,通过降维,可以发现便于人们理解的特征,其他应用:可视化和去噪等。 一、主成分分析的理解 ​ 先假设用数据的两个特征画出散点图,如果我们只保留特征1或者只保留特征2。那么此时就有一个问题,留个哪个特征比较好呢? ​ 通过上面对两个特征的映射结果可以发现保留特征1比较好,因为保留特征1,当把所有的点映射到x轴上以后,点和点之间的距离相对较大,也就是说,拥有更高的可区分度,同时还保留着部分映射之前的空间信息。那么如果把点都映射到y轴上,发现点与点距离更近了,这不符合数据原来的空间分布。所以保留特征1相比保留特征2更加合适,但是这是最好的方案吗? ​ 也就是说,我们需要找到让这个样本间距最大的轴?那么如何定义样本之间的间距呢?一般我们会使用方差(Variance),Var(x)=\frac{1}{m}\sum_{i=1}^m(x_{i}-\overline{x})^2,找到一个轴