pca | 易学教程

Invalid parameter clf for estimator Pipeline in sklearn

阅读更多关于 Invalid parameter clf for estimator Pipeline in sklearn

Could anyone check problems with the following code? Am I wrong in any steps in building my model? I already added two 'clf__' to parameters. clf=RandomForestClassifier() pca = PCA() pca_clf = make_pipeline(pca, clf) kfold = KFold(n_splits=10, random_state=22) parameters = {'clf__n_estimators': [4, 6, 9], 'clf__max_features': ['log2', 'sqrt','auto'],'clf__criterion': ['entropy', 'gini'], 'clf__max_depth': [2, 3, 5, 10], 'clf__min_samples_split': [2, 3, 5], 'clf__min_samples_leaf': [1,5,8] } grid_RF=GridSearchCV(pca_clf,param_grid=parameters, scoring='accuracy',cv=kfold) grid_RF = grid_RF.fit(X

项目 3: 创建用户分类

阅读更多关于项目 3: 创建用户分类

欢迎来到机器学习工程师纳米学位的第三个项目！在这个notebook文件中，有些模板代码已经提供给你，但你还需要实现更多的功能来完成这个项目。除非有明确要求，你无须修改任何已给出的代码。以 '练习' 开始的标题表示接下来的代码部分中有你必须要实现的功能。每一部分都会有详细的指导，需要实现的部分也会在注释中以 'TODO' 标出。请仔细阅读所有的提示！除了实现代码外，你还必须回答一些与项目和你的实现有关的问题。每一个需要你回答的问题都会以 '问题 X' 为标题。请仔细阅读每个问题，并且在问题后的 '回答' 文字框中写出完整的答案。我们将根据你对问题的回答和撰写代码所实现的功能来对你提交的项目进行评分。提示： Code 和 Markdown 区域可通过 Shift + Enter 快捷键运行。此外，Markdown可以通过双击进入编辑模式。开始在这个项目中，你将分析一个数据集的内在结构，这个数据集包含很多客户真对不同类型产品的年度采购额（用金额表示）。这个项目的任务之一是如何最好地描述一个批发商不同种类顾客之间的差异。这样做将能够使得批发商能够更好的组织他们的物流服务以满足每个客户的需求。这个项目的数据集能够在 UCI机器学习信息库中找到.因为这个项目的目的，分析将不会包括'Channel'和'Region'这两个特征——重点集中在6个记录的客户购买的产品类别上。

优达（Udacity）customer_segments

阅读更多关于优达（Udacity）customer_segments

github地址机器学习纳米学位非监督学习项目 3: 创建用户分类欢迎来到机器学习工程师纳米学位的第三个项目！在这个notebook文件中，有些模板代码已经提供给你，但你还需要实现更多的功能来完成这个项目。除非有明确要求，你无须修改任何已给出的代码。以 ‘练习’ 开始的标题表示接下来的代码部分中有你必须要实现的功能。每一部分都会有详细的指导，需要实现的部分也会在注释中以 ‘TODO’ 标出。请仔细阅读所有的提示！除了实现代码外，你还必须回答一些与项目和你的实现有关的问题。每一个需要你回答的问题都会以 ‘问题 X’ 为标题。请仔细阅读每个问题，并且在问题后的 ‘回答’ 文字框中写出完整的答案。我们将根据你对问题的回答和撰写代码所实现的功能来对你提交的项目进行评分。提示：**Code 和 Markdown 区域可通过 **Shift + Enter 快捷键运行。此外，Markdown可以通过双击进入编辑模式。开始在这个项目中，你将分析一个数据集的内在结构，这个数据集包含很多客户真对不同类型产品的年度采购额（用金额表示）。这个项目的任务之一是如何最好地描述一个批发商不同种类顾客之间的差异。这样做将能够使得批发商能够更好的组织他们的物流服务以满足每个客户的需求。这个项目的数据集能够在 UCI机器学习信息库中找到.因为这个项目的目的，分析将不会包括’Channel

Principal component analysis with EQUAMAX rotation in R

阅读更多关于 Principal component analysis with EQUAMAX rotation in R

问题 I need to do a principal component analysis ( PCA ) with EQUAMAX-rotation in R . Unfortunately the function "principal()" I use normally for PCA does not offer this kind of rotation. I could find out that it may be possible somehow with the package GPArotation but I could not yet figure out how to use this in the PCA . Maybe someone can give an example on how to do an equamax-rotation PCA ? Or is there a function for PCA in another package that offers the use of equamax-rotation directly?

C++ - framework for computing PCA (other than armadillo)

阅读更多关于 C++ - framework for computing PCA (other than armadillo)

I have a large dataset of around 200000 data points where each data point contains 132 features. So basically my dataset is 200000 x 132 . I have done all the computations by using the armadillo framework . However, I have tried to do PCA analysis but I received a memory error which I don't know that it's because of my RAM memory( 8 GB of Ram ) or its a limitation due to the framework itself. I receive the following error : requested size is too large . Can you recommend me another framework for PCA computation which doesn't have size/memory limtations? Or if you have previously used armadillo

Dimensionality reduction in HOG feature vector

阅读更多关于 Dimensionality reduction in HOG feature vector

I found out the HOG feature vector of the following image in MATLAB. Input Image I used the following code. I = imread('input.jpg'); I = rgb2gray(I); [features, visualization] = extractHOGFeatures(I,'CellSize',[16 16]); features comes out to be a 1x1944 vector and I need to reduce the dimensionality of this vector (say to 1x100 ), what method should I employ for the same? I thought of Principal Component Analysis and ran the following in MATLAB. prinvec = pca(features); prinvec comes out to be an empty matrix ( 1944x0 ). Am I doing it wrong? If not PCA, what other methods can I use to reduce

scikits-learn pca dimension reduction issue

阅读更多关于 scikits-learn pca dimension reduction issue

问题 I have a problem with reduction dimension using scikit-learn and PCA. I have two numpy matrices, one has size (1050,4096) and another has size (50,4096). I tried to reduce the dimensions of both to yield (1050, 399) and (50,399) but, after doing the pca I got (1050,399) and (50,50) matrices. One matrix is for knn training and another for knn test. What's wrong with my code below? pca = decomposition.PCA() pca.fit(train) pca.n_components = 399 train_reduced = pca.fit_transform(train) pca.n

How to Use PCA to Reduce Dimension

阅读更多关于 How to Use PCA to Reduce Dimension

Input : LBP Feature extracted from an image with dimension 75520, so the input LBP data contains 1 row and 75520 columns. Required Output: Apply PCA on input to reduce the dimension, Currently my code look like, void PCA_DimensionReduction(Mat &src, Mat &dst){ int PCA_DIMENSON_VAL 40 Mat tmp = src.reshape(1,1); //1 rows X 75520 cols Mat projection_result; Mat input_feature_vector; Mat norm_tmp; normalize(tmp,input_feature_vector,0,1,NORM_MINMAX,CV_32FC1); PCA pca(input_feature_vector,Mat(),CV_PCA_DATA_AS_ROW, PCA_DIMENSON_VAL); pca.project(input_feature_vector,projection_result); dst =

Bug in Scikit-Learn PCA or in Numpy Eigen Decomposition?

阅读更多关于 Bug in Scikit-Learn PCA or in Numpy Eigen Decomposition?

I have a dataset with 400 features. What I did: # approach 1 d_cov = np.cov(d_train.transpose()) eigens, mypca = LA.eig(d_cov) # assume sort by eigen value also/ LA = numpy linear algebra # approach 2 pca = PCA(n_components=300) d_fit = pca.fit_transform(d_train) pc = pca.components_ Now, these two should be the same, right? as PCA is just the eigendecomposition of the covariance matrix. But these are very different in my case? How could that be, I am doing any mistake above? Comparing variances: import numpy as np LA = np.linalg d_train = np.random.randn(100, 10) d_cov = np.cov(d_train

subset of prcomp object in R

阅读更多关于 subset of prcomp object in R

I'm basically computing the PCA for a set of variables and everything works fine. Lets say I'm using the iris data as an example, but my data is different. The iris data should be sufficient to explain my question: data(iris) log.ir <- log(iris[, 1:4]) log.ir[mapply(is.infinite, log.ir)] <- 0 ir.groups<- iris[, 5] ir.pca <- prcomp(log.ir, center = TRUE, scale. = TRUE) library(ggbiplot) g <- ggbiplot(ir.pca, obs.scale = 1,var.scale = 1,groups = ir.groups, var.axes=F) g <- g + scale_color_discrete(name = '') g <- g + theme(legend.direction = 'horizontal', legend.position = 'top') + theme(legend

订阅 pca