pca

Rolling PCA and plotting proportional variance of principal components

南笙酒味 提交于 2019-12-02 06:26:05
问题 I'm using the following code to perform PCA: PCA <- prcomp(Ret1, center = TRUE, scale. = TRUE) summary(PCA) I get the following result: #Importance of components: # PC1 PC2 PC3 PC4 #Standard deviation 1.6338 0.9675 0.60446 0.17051 #Proportion of Variance 0.6673 0.2340 0.09134 0.00727 #Cumulative Proportion 0.6673 0.9014 0.99273 1.00000 What I would like to do is a Rolling PCA for a specific window ( e.g. 180 days). The Result should be a matrix which shows the evolution of the "Proportion of

math domain error while using PCA

回眸只為那壹抹淺笑 提交于 2019-12-02 06:09:43
问题 I am using python's scikit-learn package to implement PCA .I am getting math domain error : C:\Users\Akshenndra\Anaconda2\lib\site-packages\sklearn\decomposition\pca.pyc in _assess_dimension_(spectrum, rank, n_samples, n_features) 78 for j in range(i + 1, len(spectrum)): 79 pa += log((spectrum[i] - spectrum[j]) * ---> 80 (1. / spectrum_[j] - 1. / spectrum_[i])) + log(n_samples) 81 82 ll = pu + pl + pv + pp - pa / 2. - rank * log(n_samples) / 2. ValueError: math domain error I already know

使用PCA进行特征降维

孤人 提交于 2019-12-02 05:44:50
特征降维: 特征降维是无监督学习的另一个应用,目的有二:其一,我们会经常在实际项目中遭遇特征维度非常之高的训练样本,而往往又无法借助自己的领域知识人工构建有效特征; 其二,在数据表现方面,我们无法用肉眼观测超过三个维度的特征。因此,特征降维不仅重构了有效的低维度特征向量,同时也为数据展现提供了可能。 在特征降维的方法中,主成分分析是最为经典和实用的特征降维技术,特别在辅助图像识别方面有突出的表现。 如何用矩阵的秩来判别向量组的线性相关性: m×n 矩阵 A ,如果 r(A) = m < n,则行向量组无关,列向量组相关, 如果 r(A) = k < min(m,n),则行向量组、列向量组都相关, 如果 r(A) = n < m,则列向量组无关,行向量组相关。 如果 r(A) = m = n ,则行向量组、列向量组都无关。 如下图代码所示,我们有一组2*2的数据[(1,2),(2,4)]。假设这两个数据都反映到一个类别(分类)或者一个类簇(聚类)。如果我们的学习模型模型是线性模型,那么这两个数据其实只能帮助权重参数更新一次,因为他们线性相关,所有的特征数值都只是扩张了相同的倍数;如果使用PCA分析的话,这个矩阵的“秩”是1,也就是说,在多样性程度上,这个矩阵只有一个自由度。 #--线性相关矩阵秩计算样例 #导入numpy工具包 import numpy as np #初始化一个2

Obtaining unstandardized factor scores from factor analysis

会有一股神秘感。 提交于 2019-12-02 05:13:52
问题 I'm conducting a factor analysis of several variables in R using factanal(). I want to determine each case's factor score, but I want the factor scores to be unstandardized and on the original metric of the input variables. When I run the factor analysis and obtain the factor scores, they appear to be standardized and not on the original metric of the input variables. How can I obtain unstandardized factor scores that have the same metric as the input variables? Ideally, this would mean a

TypeError grid seach

不羁的心 提交于 2019-12-02 04:27:13
I used to create loop for finding the best parameters for my model which increased my errors in coding so I decided to use GridSearchCV . I am trying to find out the best parameters for PCA for my model (the only parameter I want to grid search on). In this model, after normalization I want to combine the original features with the PCA reduced features and then apply the linear SVM. Then I save the whole model to predict my input on. I have an error in the line where I try to fit the data so I can use best_estimator_ and best_params_ functions. The error says: TypeError: The score function

使用协方差矩阵的特征向量PCA来处理数据降维

六眼飞鱼酱① 提交于 2019-12-02 02:10:44
取2维特征,方便图形展示 import matplotlib.pyplot as plt from sklearn.decomposition import PCA from sklearn.datasets import load_iris data = load_iris() y = data.target X = data.data pca = PCA(n_components=2) reduced_X = pca.fit_transform(X) red_x, red_y = [], [] blue_x, blue_y = [], [] green_x, green_y = [], [] for i in range(len(reduced_X)): if y[i] == 0: red_x.append(reduced_X[i][0]) red_y.append(reduced_X[i][1]) elif y[i] == 1: blue_x.append(reduced_X[i][0]) blue_y.append(reduced_X[i][1]) else: green_x.append(reduced_X[i][0]) green_y.append(reduced_X[i][1]) plt.scatter(red_x, red_y, c='r', marker='x')

Anomaly detection with PCA in Spark

☆樱花仙子☆ 提交于 2019-12-02 00:21:53
I read the following article Anomaly detection with Principal Component Analysis (PCA) In the article is written following: • PCA algorithm basically transforms data readings from an existing coordinate system into a new coordinate system. • The closer data readings are to the center of the new coordinate system, the closer these readings are to an optimum value. • The anomaly score is calculated using the Mahalanobis distance between a reading and the mean of all readings, which is the center of the transformed coordinate system. Can anyone describe me more in detail about anomaly detection

math domain error while using PCA

 ̄綄美尐妖づ 提交于 2019-12-01 23:36:49
I am using python's scikit-learn package to implement PCA .I am getting math domain error : C:\Users\Akshenndra\Anaconda2\lib\site-packages\sklearn\decomposition\pca.pyc in _assess_dimension_(spectrum, rank, n_samples, n_features) 78 for j in range(i + 1, len(spectrum)): 79 pa += log((spectrum[i] - spectrum[j]) * ---> 80 (1. / spectrum_[j] - 1. / spectrum_[i])) + log(n_samples) 81 82 ll = pu + pl + pv + pp - pa / 2. - rank * log(n_samples) / 2. ValueError: math domain error I already know that math domain error is caused when we take logarithm of a negative number ,but I don't understand here

psych::principal - explanation for the order and naming of rotated (principal) components

对着背影说爱祢 提交于 2019-12-01 21:32:40
Let x be a sample dataframe. set.seed(0) x <- replicate(4, rnorm(10)) A PCA using the principal function from the psych package will yield: > principal(x, nf=4, rotate="none") ... PC1 PC2 PC3 PC4 SS loadings 1.91 1.09 0.68 0.31 Proportion Var 0.48 0.27 0.17 0.08 Cumulative Var 0.48 0.75 0.92 1.00 Proportion Explained 0.48 0.27 0.17 0.08 Cumulative Proportion 0.48 0.75 0.92 1.00 Rotating te PCA solution using the varimax criterion yields new components now named RCi to indicate that the PCs have been rotated (hence, they are no PCs anymore). > principal(x, nf=4, rotate="varimax") ... RC4 RC3

Comparing svd and princomp in R

删除回忆录丶 提交于 2019-12-01 20:23:22
I want to get singular values of a matrix in R to get the principal components, then make princomp(x) too to compare results I know princomp() would give the principal components Question How to get the principal components from $d, $u, and $v (solution of s = svd(x) )? One way or another, you should probably look into prcomp , which calculates PCA using svd instead of eigen (as in princomp ). That way, if all you want is the PCA output, but calculated using svd , you're golden. Also, if you type stats:::prcomp.default at the command line, you can see how it's using the output of svd yourself.