pca

Hotelling's T^2 scores in python

二次信任 提交于 2019-12-09 05:38:36
问题 I applied pca on a data set using matplotlib in python. However, matplotlib does not provide a t-squared scores like Matlab. Is there a way to compute Hotelling's T^2 score like Matlab? Thanks. 回答1: matplotlib's PCA class doesn't include the Hotelling T 2 calculation, but it can be done with just a couple lines of code. The following code includes a function to compute the T 2 values for each point. The __main__ script applies PCA to the same example as used in Matlab's pca documentation, so

PCA in Sklearn - ValueError: array must not contain infs or NaNs

▼魔方 西西 提交于 2019-12-09 03:51:01
问题 I am trying to use grid search to choose the number of principal components of the data before fitting into a linear regression. I am confused how I can make a dictionary of the number of principal components I want. I put my list into a dictionary format in the param_grid parameter, but I think I did it wrong. So far, I have gotten a warning about my array containing infs or NaNs. I am following the instructions from pipelining a linear regression to PCA: http://scikit-learn.org/stable/auto

How to extract components after performing principal component regression for further analysis in R caret package

眉间皱痕 提交于 2019-12-09 01:48:50
问题 I had a dataset that contained 151 variables, that were found to be high in colinearility, so I performed principal component regression on it by doing the following:- ctrl <- trainControl(method = "repeatedcv", repeats = 10, savePred = T) model <- train(RT..seconds.~., data = cadets100, method = "pcr", trControl = ctrl) which gives me me:- RMSE = 65.7 R-squared 0.443 I was just wondering how I went about extracting these components after so that I could get say apply further analysis (i.e.

Basic example for PCA with matplotlib

拥有回忆 提交于 2019-12-08 22:46:15
问题 I trying to do a simple principal component analysis with matplotlib.mlab.PCA but with the attributes of the class I can't get a clean solution to my problem. Here's an example: Get some dummy data in 2D and start PCA: from matplotlib.mlab import PCA import numpy as np N = 1000 xTrue = np.linspace(0,1000,N) yTrue = 3*xTrue xData = xTrue + np.random.normal(0, 100, N) yData = yTrue + np.random.normal(0, 100, N) xData = np.reshape(xData, (N, 1)) yData = np.reshape(yData, (N, 1)) data = np.hstack

PCA with sklearn. Unable to figure out feature selection with PCA

两盒软妹~` 提交于 2019-12-08 10:08:31
I have been trying to do some dimensionality reduction using PCA. I currently have an image of size (100, 100) and I am using a filterbank of 140 Gabor filters where each filter gives me a response which is again an image of (100, 100). Now, I wanted to do feature selection where I only wanted to select non-redundant features and I read that PCA might be a good way to do. So I proceeded to create a data matrix which has 10000 rows and 140 columns. So, each row contains the various responses of the Gabor filters for that filterbank. Now, as I understand it I can do a decomposition of this

Invalid parameter clf for estimator Pipeline in sklearn

女生的网名这么多〃 提交于 2019-12-08 08:01:29
问题 Could anyone check problems with the following code? Am I wrong in any steps in building my model? I already added two 'clf__' to parameters. clf=RandomForestClassifier() pca = PCA() pca_clf = make_pipeline(pca, clf) kfold = KFold(n_splits=10, random_state=22) parameters = {'clf__n_estimators': [4, 6, 9], 'clf__max_features': ['log2', 'sqrt','auto'],'clf__criterion': ['entropy', 'gini'], 'clf__max_depth': [2, 3, 5, 10], 'clf__min_samples_split': [2, 3, 5], 'clf__min_samples_leaf': [1,5,8] }

PCA with sklearn. Unable to figure out feature selection with PCA

天大地大妈咪最大 提交于 2019-12-08 07:17:02
问题 I have been trying to do some dimensionality reduction using PCA. I currently have an image of size (100, 100) and I am using a filterbank of 140 Gabor filters where each filter gives me a response which is again an image of (100, 100). Now, I wanted to do feature selection where I only wanted to select non-redundant features and I read that PCA might be a good way to do. So I proceeded to create a data matrix which has 10000 rows and 140 columns. So, each row contains the various responses

Does partial fit runs in parallel in sklearn.decomposition.IncrementalPCA?

南笙酒味 提交于 2019-12-08 05:33:55
问题 I've followed Imanol Luengo's answer to build a partial fit and transform for sklearn.decomposition.IncrementalPCA. But for some reason, it looks like (from htop) it uses all CPU cores at maximum. I could find neither n_jobs parameter nor anything related to multiprocessing. My question is: if this is default behavior of these functions how can I set the number of CPU's and where can I find information about it? If not, obviously I am doing something wrong in previous sections of my code. PS:

Add sample names to PCA plotted with s.class

£可爱£侵袭症+ 提交于 2019-12-08 03:52:34
问题 I'm trying to build a pca plot with the s.class function of ade4 package. I have a data set containing the abundance of bacterial species (rows) in different samples (columns). I need to perform some statistical tests and obtain some clusters of my samples, to represent them in a PCA. I ran this, following the scripts published on a paper: >data=read.table("L4.txt",header=T,row.names=1,dec=".",sep="\t") >data=data[-1,] >library(cluster) > JSD <- function(x,y) sqrt(0.5 * KLD(x, (x+y)/2) + 0.5

Python中的图像处理

有些话、适合烂在心里 提交于 2019-12-08 03:00:03
http://www.ituring.com.cn/tupubarticle/2024 第 1 章 基本的图像操作和处理 本章讲解操作和处理图像的基础知识,将通过大量示例介绍处理图像所需的 Python 工具包,并介绍用于读取图像、图像转换和缩放、计算导数、画图和保存结果等的基本工具。这些工具的使用将贯穿本书的剩余章节。 1.1 PIL:Python图像处理类库 PIL(Python Imaging Library Python,图像处理类库)提供了通用的图像处理功能,以及大量有用的基本图像操作,比如图像缩放、裁剪、旋转、颜色转换等。PIL 是免费的,可以从 http://www.pythonware.com/products/pil/ 下载。 利用 PIL 中的函数,我们可以从大多数图像格式的文件中读取数据,然后写入最常见的图像格式文件中。PIL 中最重要的模块为 Image 。要读取一幅图像,可以使用: from PIL import Image pil_im = Image . open ( 'empire.jpg' ) 上述代码的返回值 pil_im 是一个 PIL 图像对象。 图像的颜色转换可以使用 convert() 方法来实现。要读取一幅图像,并将其转换成灰度图像,只需要加上 convert('L') ,如下所示: pil_im = Image . open (