pca | 易学教程

C++ - framework for computing PCA (other than armadillo)

阅读更多关于 C++ - framework for computing PCA (other than armadillo)

问题 I have a large dataset of around 200000 data points where each data point contains 132 features. So basically my dataset is 200000 x 132 . I have done all the computations by using the armadillo framework. However, I have tried to do PCA analysis but I received a memory error which I don't know that it's because of my RAM memory( 8 GB of Ram ) or its a limitation due to the framework itself. I receive the following error : requested size is too large . Can you recommend me another framework

How to calculate the volume of the intersection of ellipses in r

阅读更多关于 How to calculate the volume of the intersection of ellipses in r

问题 I was wondering how to calculate the intersection between two ellipses e.g. the volume of the intersection between versicolor and virginca as illustrated in this graph: which is plotted using the following mwe based on this tutorial: data(iris) log.ir <- log(iris[, 1:4]) ir.species <- iris[, 5] ir.pca <- prcomp(log.ir, center = TRUE, scale. = TRUE) library(ggbiplot) g <- ggbiplot(ir.pca, obs.scale = 1, var.scale = 1, groups = ir.species, ellipse = TRUE, circle = TRUE) g <- g + scale_color

How to find most contributing features to PCA?

阅读更多关于 How to find most contributing features to PCA?

问题 I am running PCA on my data (~250 features) and see that all points are clustered in 3 blobs. Is it possible to see which of the 250 features have been most contributing to the outcome? if so how? (using the Scikit-learn implementation) 回答1: Let's see what wikipedia says: PCA is mathematically defined as an orthogonal linear transformation that transforms the data to a new coordinate system such that the greatest variance by some projection of the data comes to lie on the first coordinate

R: ggfortify: “Objects of type prcomp not supported by autoplot”

阅读更多关于 R: ggfortify: “Objects of type prcomp not supported by autoplot”

问题 I am trying to use ggfortify to visualize the results of a PCA I did using prcomp. sample code: iris.pca <- iris[c(1, 2, 3, 4)] autoplot(prcomp(iris.pca)) Error: Objects of type prcomp not supported by autoplot. Please use qplot() or ggplot() instead. What is odd is that autoplot is specifically designed to handle the results of prcomp - ggplot and qplot can't handle objects like this. I'm running R version 3.2 and just downloaded ggfortify off of github this AM. Can anyone explain this

classification: PCA and logistic regression using sklearn

阅读更多关于 classification: PCA and logistic regression using sklearn

问题 Step 0: Problem description I have a classification problem, ie I want to predict a binary target based on a collection of numerical features, using logistic regression, and after running a Principal Components Analysis (PCA). I have 2 datasets: df_train and df_valid (training set and validation set respectively) as pandas data frame, containing the features and the target. As a first step, I have used get_dummies pandas function to transform all the categorical variables as boolean. For

矩阵特征值分解与奇异值分解含义解析及应用

阅读更多关于矩阵特征值分解与奇异值分解含义解析及应用

特征值与特征向量的几何意义矩阵的乘法是什么，别只告诉我只是“前一个矩阵的行乘以后一个矩阵的列”，还会一点的可能还会说“前一个矩阵的列数等于后一个矩阵的行数才能相乘”，然而，这里却会和你说——那都是表象。矩阵乘法真正的含义是变换，我们学《线性代数》一开始就学行变换列变换，那才是线代的核心——别会了点猫腻就忘了本——对，矩阵乘法就是线性变换，若以其中一个向量A为中心，则B的作用主要是使A发生如下变化： 1、伸缩 clf; A = [0, 1, 1, 0, 0;... 1, 1, 0, 0, 1]; % 原空间 B = [3 0; 0 2]; % 线性变换矩阵 plot(A(1,:),A(2,:), '-*');hold on grid on;axis([0 3 0 3]); gtext('变换前'); Y = B * A; plot(Y(1,:),Y(2,:), '-r*'); grid on;axis([0 3 0 3]); gtext('变换后'); 从上图可知，y方向进行了2倍的拉伸，x方向进行了3倍的拉伸，这就是B=[3 0; 0 2]的功劳,3和2就是伸缩比例。请注意，这时B除了对角线元素为各个维度的倍数外，非正对角线元素都为0，因为下面将要看到，对角线元素非0则将会发生切变及旋转的效果。 2、切变 clf; A = [0, 1, 1, 0, 0;... 1, 1, 0

Using Principal Components Analysis (PCA) on binary data

阅读更多关于 Using Principal Components Analysis (PCA) on binary data

问题 I am using PCA on binary attributes to reduce the dimensions (attributes) of my problem. The initial dimensions were 592 and after PCA the dimensions are 497. I used PCA before, on numeric attributes in an other problem and it managed to reduce the dimensions in a greater extent (the half of the initial dimensions). I believe that binary attributes decrease the power of PCA, but i do not know why. Could you please explain me why PCA does not work as good as in numeric data. Thank you. 回答1:

python3（五）无监督学习

阅读更多关于 python3（五）无监督学习

无监督学习目录 1 关于机器学习 2 sklearn库中的标准数据集及基本功能 2.1 标准数据集 2.2 sklearn库的基本功能 3 关于无监督学习 4 K-means方法及应用 5 DBSCAN方法及应用 6 PCA方法及其应用 7 NMF方法及其实例 8 基于聚类的“图像分割” 正文回到顶部 1 关于机器学习　　机器学习是实现人工智能的手段, 其主要研究内容是如何利用数据或经验进行学习, 改善具体算法的性能　　　　多领域交叉, 涉及概率论、统计学, 算法复杂度理论等多门学科　　　　广泛应用于网络搜索、垃圾邮件过滤、推荐系统、广告投放、信用评价、欺诈检测、股票交易和医疗诊断等应用　　机器学习的分类　　　　监督学习（Supervised Learning）　　　　　　从给定的数据集中学习出一个函数, 当新的数据到来时, 可以根据这个函数预测结果, 训练集通常由人工标注　　　　无监督学习（Unsupervised Learning）　　　　　　相较于监督学习, 没有人工标注　　　　强化学习（Reinforcement Learning，增强学习）　　　　　　通过观察通过什么样的动作获得最好的回报, 每个动作都会对环境有所影响, 学习对象通过观察周围的环境进行判断　　　　半监督学习（Semi-supervised Learning）

Obtain unstandardized factor scores from factor analysis in R

阅读更多关于 Obtain unstandardized factor scores from factor analysis in R

I'm conducting a factor analysis of several variables in R using factanal() (but am open to using other packages). I want to determine each case's factor score, but I want the factor scores to be unstandardized and on the original metric of the input variables. When I run the factor analysis and obtain the factor scores, they are standardized with a normal distribution of mean=0, SD=1, and are not on the original metric of the input variables. How can I obtain unstandardized factor scores that have the same metric as the input variables? Ideally, this would mean a similar mean, sd, range, and

Does partial fit runs in parallel in sklearn.decomposition.IncrementalPCA?

阅读更多关于 Does partial fit runs in parallel in sklearn.decomposition.IncrementalPCA?

I've followed Imanol Luengo 's answer to build a partial fit and transform for sklearn.decomposition.IncrementalPCA . But for some reason, it looks like (from htop) it uses all CPU cores at maximum. I could find neither n_jobs parameter nor anything related to multiprocessing. My question is: if this is default behavior of these functions how can I set the number of CPU's and where can I find information about it? If not, obviously I am doing something wrong in previous sections of my code. PS: I need to limit the number of CPU cores because using all cores in a server causing a lot of trouble

订阅 pca