pca

PCA-LDA analysis - R

大憨熊 提交于 2020-07-10 03:11:22
问题 In this example (https://gist.github.com/thigm85/8424654) LDA was examined vs. PCA on iris dataset. How can I also do LDA on the PCA results (PCA-LDA) ? Code: require(MASS) require(ggplot2) require(scales) require(gridExtra) pca <- prcomp(iris[,-5], center = TRUE, scale. = TRUE) prop.pca = pca$sdev^2/sum(pca$sdev^2) lda <- lda(Species ~ ., iris, prior = c(1,1,1)/3) prop.lda = lda$svd^2/sum(lda$svd^2) plda <- predict(object = lda, newdata = iris) dataset = data.frame(species = iris[,"Species"]

XGBoost with GridSearchCV, Scaling, PCA, and Early-Stopping in sklearn Pipeline

夙愿已清 提交于 2020-06-09 11:31:45
问题 I want to combine a XGBoost model with input scaling and feature space reduction by PCA. In addition, the hyperparameters of the model as well as the number of components used in the PCA should be tuned using cross-validation. And to prevent the model from overfitting, early stopping should be added. For combining the various steps, I decided to use sklearn's Pipeline functionalities. At the beginning, I had some problems making sure, that the PCA is also applied to the validation set. But I

Python PCA on Matrix too large to fit into memory

别说谁变了你拦得住时间么 提交于 2020-06-07 06:21:46
问题 I have a csv that is 100,000 rows x 27,000 columns that I am trying to do PCA on to produce a 100,000 rows X 300 columns matrix. The csv is 9GB large. Here is currently what I'm doing: from sklearn.decomposition import PCA as RandomizedPCA import csv import sys import numpy as np import pandas as pd dataset = sys.argv[1] X = pd.DataFrame.from_csv(dataset) Y = X.pop("Y_Level") X = (X - X.mean()) / (X.max() - X.min()) Y = list(Y) dimensions = 300 sklearn_pca = RandomizedPCA(n_components

Using mca package in Python

点点圈 提交于 2020-05-10 07:54:05
问题 I am trying to use the mca package to do multiple correspondence analysis in Python. I am a bit confused as to how to use it. With PCA I would expect to fit some data (i.e. find principal components for those data) and then later I would be able to use the principal components that I found to transform unseen data. Based on the MCA documentation, I cannot work out how to do this last step. I also don't understand what any of the weirdly cryptically named properties and methods do (i.e. .E ,

PCA数学角度解析

我只是一个虾纸丫 提交于 2020-04-03 13:50:31
转:http://blog.csdn.net/passball/article/details/24037593 主成分分析(PCA)是多元统计分析中用来分析数据的一种方法,它是用一种较少数量的特征对样本进行描述以达到降低特征空间维数的方法,它的本质实际上是K-L变换。PCA方法最著名的应用应该是在人脸识别中特征提取及数据维,我们知道输入200*200大小的人脸图像,单单提取它的灰度值作为原始特征,则这个原始特征将达到40000维,这给后面分类器的处理将带来极大的难度。著名的人脸识别Eigenface算法就是采用PCA算法,用一个低维子空间描述人脸图像,同时用保存了识别所需要的信息。下面先介绍下PCA算法的本质K-L变换。 1、K-L变换(卡洛南-洛伊(Karhunen-Loeve)变换): 最优正交变换 一种常用的特征提取方法; 最小均方误差意义下的最优正交变换; 在消除模式特征之间的相关性、突出差异性方面有最优的效果。 离散K-L变换:对向量 x (可以想象成 M维=width*height 的人脸图像原始特征)用确定的完备正交归一向量系 u j 展开: 这个公式由来我想应该是任一 n维欧式空间 V均存在正交基,利用施密特正交化过程即可构建这个正交基。 现在我们希望用 d个有限项来估计向量 x,公式如下: 计算该估计的均方误差如下: 要使用均方误差最小