pca | 易学教程

Hotelling's T^2 scores in python

阅读更多关于 Hotelling's T^2 scores in python

问题 I applied pca on a data set using matplotlib in python. However, matplotlib does not provide a t-squared scores like Matlab. Is there a way to compute Hotelling's T^2 score like Matlab? Thanks. 回答1: matplotlib's PCA class doesn't include the Hotelling T 2 calculation, but it can be done with just a couple lines of code. The following code includes a function to compute the T 2 values for each point. The __main__ script applies PCA to the same example as used in Matlab's pca documentation, so

PCA in Sklearn - ValueError: array must not contain infs or NaNs

阅读更多关于 PCA in Sklearn - ValueError: array must not contain infs or NaNs

问题 I am trying to use grid search to choose the number of principal components of the data before fitting into a linear regression. I am confused how I can make a dictionary of the number of principal components I want. I put my list into a dictionary format in the param_grid parameter, but I think I did it wrong. So far, I have gotten a warning about my array containing infs or NaNs. I am following the instructions from pipelining a linear regression to PCA: http://scikit-learn.org/stable/auto

How to extract components after performing principal component regression for further analysis in R caret package

阅读更多关于 How to extract components after performing principal component regression for further analysis in R caret package

问题 I had a dataset that contained 151 variables, that were found to be high in colinearility, so I performed principal component regression on it by doing the following:- ctrl <- trainControl(method = "repeatedcv", repeats = 10, savePred = T) model <- train(RT..seconds.~., data = cadets100, method = "pcr", trControl = ctrl) which gives me me:- RMSE = 65.7 R-squared 0.443 I was just wondering how I went about extracting these components after so that I could get say apply further analysis (i.e.

Basic example for PCA with matplotlib

阅读更多关于 Basic example for PCA with matplotlib

问题 I trying to do a simple principal component analysis with matplotlib.mlab.PCA but with the attributes of the class I can't get a clean solution to my problem. Here's an example: Get some dummy data in 2D and start PCA: from matplotlib.mlab import PCA import numpy as np N = 1000 xTrue = np.linspace(0,1000,N) yTrue = 3*xTrue xData = xTrue + np.random.normal(0, 100, N) yData = yTrue + np.random.normal(0, 100, N) xData = np.reshape(xData, (N, 1)) yData = np.reshape(yData, (N, 1)) data = np.hstack

PCA with sklearn. Unable to figure out feature selection with PCA

阅读更多关于 PCA with sklearn. Unable to figure out feature selection with PCA

I have been trying to do some dimensionality reduction using PCA. I currently have an image of size (100, 100) and I am using a filterbank of 140 Gabor filters where each filter gives me a response which is again an image of (100, 100). Now, I wanted to do feature selection where I only wanted to select non-redundant features and I read that PCA might be a good way to do. So I proceeded to create a data matrix which has 10000 rows and 140 columns. So, each row contains the various responses of the Gabor filters for that filterbank. Now, as I understand it I can do a decomposition of this

Invalid parameter clf for estimator Pipeline in sklearn

阅读更多关于 Invalid parameter clf for estimator Pipeline in sklearn

问题 Could anyone check problems with the following code? Am I wrong in any steps in building my model? I already added two 'clf__' to parameters. clf=RandomForestClassifier() pca = PCA() pca_clf = make_pipeline(pca, clf) kfold = KFold(n_splits=10, random_state=22) parameters = {'clf__n_estimators': [4, 6, 9], 'clf__max_features': ['log2', 'sqrt','auto'],'clf__criterion': ['entropy', 'gini'], 'clf__max_depth': [2, 3, 5, 10], 'clf__min_samples_split': [2, 3, 5], 'clf__min_samples_leaf': [1,5,8] }

PCA with sklearn. Unable to figure out feature selection with PCA

阅读更多关于 PCA with sklearn. Unable to figure out feature selection with PCA

问题 I have been trying to do some dimensionality reduction using PCA. I currently have an image of size (100, 100) and I am using a filterbank of 140 Gabor filters where each filter gives me a response which is again an image of (100, 100). Now, I wanted to do feature selection where I only wanted to select non-redundant features and I read that PCA might be a good way to do. So I proceeded to create a data matrix which has 10000 rows and 140 columns. So, each row contains the various responses

Does partial fit runs in parallel in sklearn.decomposition.IncrementalPCA?

阅读更多关于 Does partial fit runs in parallel in sklearn.decomposition.IncrementalPCA?

问题 I've followed Imanol Luengo's answer to build a partial fit and transform for sklearn.decomposition.IncrementalPCA. But for some reason, it looks like (from htop) it uses all CPU cores at maximum. I could find neither n_jobs parameter nor anything related to multiprocessing. My question is: if this is default behavior of these functions how can I set the number of CPU's and where can I find information about it? If not, obviously I am doing something wrong in previous sections of my code. PS:

Add sample names to PCA plotted with s.class

阅读更多关于 Add sample names to PCA plotted with s.class

问题 I'm trying to build a pca plot with the s.class function of ade4 package. I have a data set containing the abundance of bacterial species (rows) in different samples (columns). I need to perform some statistical tests and obtain some clusters of my samples, to represent them in a PCA. I ran this, following the scripts published on a paper: >data=read.table("L4.txt",header=T,row.names=1,dec=".",sep="\t") >data=data[-1,] >library(cluster) > JSD <- function(x,y) sqrt(0.5 * KLD(x, (x+y)/2) + 0.5

Python中的图像处理

阅读更多关于 Python中的图像处理

http://www.ituring.com.cn/tupubarticle/2024 第 1 章　基本的图像操作和处理本章讲解操作和处理图像的基础知识，将通过大量示例介绍处理图像所需的 Python 工具包，并介绍用于读取图像、图像转换和缩放、计算导数、画图和保存结果等的基本工具。这些工具的使用将贯穿本书的剩余章节。 1.1　PIL：Python图像处理类库 PIL（Python Imaging Library Python，图像处理类库）提供了通用的图像处理功能，以及大量有用的基本图像操作，比如图像缩放、裁剪、旋转、颜色转换等。PIL 是免费的，可以从 http://www.pythonware.com/products/pil/ 下载。利用 PIL 中的函数，我们可以从大多数图像格式的文件中读取数据，然后写入最常见的图像格式文件中。PIL 中最重要的模块为 Image 。要读取一幅图像，可以使用： from PIL import Image pil_im = Image . open ( 'empire.jpg' ) 上述代码的返回值 pil_im 是一个 PIL 图像对象。图像的颜色转换可以使用 convert() 方法来实现。要读取一幅图像，并将其转换成灰度图像，只需要加上 convert('L') ，如下所示： pil_im = Image . open (