pca

PCA using raster datasets in R

空扰寡人 提交于 2019-12-01 17:34:37
I have several large rasters that I want to process in a PCA (to produce summary rasters). I have seen several examples whereby people seem to be simply calling prcomp or princomp. However, when I do this, I get the following error message: Error in as.vector(data): no method for coercing this S4 class to a vector Example code: files<-list.files() # a set of rasters layers<-stack(files) # using the raster package pca<-prcomp(layers) I have tried using a raster brick instead of stack but that doesn't seem to the issue. What method do I need to provide the command so that it can convert the

PCA using raster datasets in R

£可爱£侵袭症+ 提交于 2019-12-01 17:02:29
问题 I have several large rasters that I want to process in a PCA (to produce summary rasters). I have seen several examples whereby people seem to be simply calling prcomp or princomp. However, when I do this, I get the following error message: Error in as.vector(data): no method for coercing this S4 class to a vector Example code: files<-list.files() # a set of rasters layers<-stack(files) # using the raster package pca<-prcomp(layers) I have tried using a raster brick instead of stack but that

Is this the right way of projecting the training set into the eigespace? MATLAB

拈花ヽ惹草 提交于 2019-12-01 12:46:02
I have computed PCA using the following : function [signals,V] = pca2(data) [M,N] = size(data); data = reshape(data, M*N,1); % subtract off the mean for each dimension mn = mean(data,2); data = bsxfun(@minus, data, mean(data,1)); % construct the matrix Y Y = data'*data / (M*N-1); [V D] = eigs(Y, 10); % reduce to 10 dimension % project the original data signals = data * V; My question is: Is "signals" is the projection of the training set into the eigenspace? I saw in "Amir Hossein" code that "centered image vectors" that is "data" in the above code needs to be projected into the "facespace" by

Obtain unstandardized factor scores from factor analysis in R

谁说胖子不能爱 提交于 2019-12-01 12:32:44
问题 I'm conducting a factor analysis of several variables in R using factanal() (but am open to using other packages). I want to determine each case's factor score, but I want the factor scores to be unstandardized and on the original metric of the input variables. When I run the factor analysis and obtain the factor scores, they are standardized with a normal distribution of mean=0, SD=1, and are not on the original metric of the input variables. How can I obtain unstandardized factor scores

How to use sklearn's IncrementalPCA partial_fit

∥☆過路亽.° 提交于 2019-12-01 12:15:16
I've got a rather large dataset that I would like to decompose but is too big to load into memory. Researching my options, it seems that sklearn's IncrementalPCA is a good choice, but I can't quite figure out how to make it work. I can load in the data just fine: f = h5py.File('my_big_data.h5') features = f['data'] And from this example , it seems I need to decide what size chunks I want to read from it: num_rows = data.shape[0] # total number of rows in data chunk_size = 10 # how many rows at a time to feed ipca Then I can create my IncrementalPCA, stream the data chunk-by-chunk, and

How to use sklearn's IncrementalPCA partial_fit

天涯浪子 提交于 2019-12-01 10:56:42
问题 I've got a rather large dataset that I would like to decompose but is too big to load into memory. Researching my options, it seems that sklearn's IncrementalPCA is a good choice, but I can't quite figure out how to make it work. I can load in the data just fine: f = h5py.File('my_big_data.h5') features = f['data'] And from this example, it seems I need to decide what size chunks I want to read from it: num_rows = data.shape[0] # total number of rows in data chunk_size = 10 # how many rows at

PCA:主成分分析

和自甴很熟 提交于 2019-12-01 07:52:24
PCA的概念: 主要思想是将n维特征映射到k维上,这k维是全新的正交特征,这k维特征被称为主成分,在原数据的基础上重新构造出来k维。就是从原始的空间顺序的找出一组相互正交的坐标轴,新坐标轴的选择和数据本身有很大的关系。其中,第一个坐标轴是从原数据中方差最大的方向,第二个新坐标轴选择是与第一个坐标轴正交平面中使得方差最大的,第三个轴是与第一二轴正交的平面中方差最大的,依次类推。依次类推,可以得到n个这样的坐标轴。通过这种方式获得的新的坐标轴,我们发现,大部分方差都包含在前面k个坐标轴中,后面的坐标轴所含的方差几乎为0。于是,我们可以忽略余下的坐标轴,只保留前面k个含有绝大部分方差的坐标轴。事实上,这相当于只保留包含绝大部分方差的维度特征,而忽略包含方差几乎为0的特征维度,实现对数据特征的降维处理。 PCA算法: 优点:降低数据的复杂性,识别最重要的多个特征 缺点:不一定需要, 可能损失有用信息 适用数据类型:数值型数据 数据集下载链接: http://archive.ics.uci.edu/ml/machine-learning-databases/ 在PCA中应用的数据集: http://archive.ics.uci.edu/ml/machine-learning-databases/ secom/ (1)打开数据集计算特征数目:(列为特征数)在secom数据集中一行代表一条数据

Matlab: how to find which variables from dataset could be discarded using PCA in matlab?

蓝咒 提交于 2019-12-01 04:23:03
I am using PCA to find out which variables in my dataset are redundand due to being highly correlated with other variables. I am using princomp matlab function on the data previously normalized using zscore: [coeff, PC, eigenvalues] = princomp(zscore(x)) I know that eigenvalues tell me how much variation of the dataset covers every principal component, and that coeff tells me how much of i-th original variable is in the j-th principal component (where i - rows, j - columns). So I assumed that to find out which variables out of the original dataset are the most important and which are the least

MATLAB实例:PCA降维

主宰稳场 提交于 2019-12-01 02:39:14
ISODATA聚类算法的matlab程序 作者:凯鲁嘎吉 - 博客园 http://www.cnblogs.com/kailugaji/ 参考: Kmeans及ISODATA算法的matlab实现 数据见: MATLAB实例:PCA降维 中的iris数据集,保存为:iris.data,最后一列是类标签。 demo_isodata.m clear clc data_load=dlmread('iris.data'); [~,dim]=size(data_load); x=data_load(:,1:dim-1); K=3; theta_N=1; theta_S=1; theta_c=4; L=1; I=5; ISODATA(x,K,theta_N,theta_S,theta_c,L,I) ISODATA.m function ISODATA(x,K,theta_N,theta_S,theta_c,L,I) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%input parameters%%%%%% % x : data % K : 预期的聚类中心数 % theta_N : 每一聚类中心中最少的样本数,少于此数就不作为一个独立的聚类 % theta_S :一个聚类中样本距离分布的标准差 % theta_c :

sklearn PCA.transform gives different results for different trials

别来无恙 提交于 2019-12-01 01:59:19
I am doing some PCA using sklearn.decomposition.PCA. I found that if the input matrix X is big, the results of two different PCA instances for PCA.transform will not be the same. For example, when X is a 100x200 matrix, there will not be a problem. When X is a 1000x200 or a 100x2000 matrix, the results of two different PCA instances will be different. I am not sure what's the cause for this: I suppose there is no random elements in sklearn's PCA solver? I am using sklearn version 0.18.1. with python 2.7 The script below illustrates the issue. import numpy as np import sklearn.linear_model as