pca

Invalid parameter clf for estimator Pipeline in sklearn

半城伤御伤魂 提交于 2019-12-06 15:33:06
Could anyone check problems with the following code? Am I wrong in any steps in building my model? I already added two 'clf__' to parameters. clf=RandomForestClassifier() pca = PCA() pca_clf = make_pipeline(pca, clf) kfold = KFold(n_splits=10, random_state=22) parameters = {'clf__n_estimators': [4, 6, 9], 'clf__max_features': ['log2', 'sqrt','auto'],'clf__criterion': ['entropy', 'gini'], 'clf__max_depth': [2, 3, 5, 10], 'clf__min_samples_split': [2, 3, 5], 'clf__min_samples_leaf': [1,5,8] } grid_RF=GridSearchCV(pca_clf,param_grid=parameters, scoring='accuracy',cv=kfold) grid_RF = grid_RF.fit(X

项目 3: 创建用户分类

白昼怎懂夜的黑 提交于 2019-12-06 15:05:54
欢迎来到机器学习工程师纳米学位的第三个项目!在这个notebook文件中,有些模板代码已经提供给你,但你还需要实现更多的功能来完成这个项目。除非有明确要求,你无须修改任何已给出的代码。以 '练习' 开始的标题表示接下来的代码部分中有你必须要实现的功能。每一部分都会有详细的指导,需要实现的部分也会在注释中以 'TODO' 标出。请仔细阅读所有的提示! 除了实现代码外,你还 必须 回答一些与项目和你的实现有关的问题。每一个需要你回答的问题都会以 '问题 X' 为标题。请仔细阅读每个问题,并且在问题后的 '回答' 文字框中写出完整的答案。我们将根据你对问题的回答和撰写代码所实现的功能来对你提交的项目进行评分。 提示: Code 和 Markdown 区域可通过 Shift + Enter 快捷键运行。此外,Markdown可以通过双击进入编辑模式。 开始 在这个项目中,你将分析一个数据集的内在结构,这个数据集包含很多客户真对不同类型产品的年度采购额(用 金额 表示)。这个项目的任务之一是如何最好地描述一个批发商不同种类顾客之间的差异。这样做将能够使得批发商能够更好的组织他们的物流服务以满足每个客户的需求。 这个项目的数据集能够在 UCI机器学习信息库 中找到.因为这个项目的目的,分析将不会包括'Channel'和'Region'这两个特征——重点集中在6个记录的客户购买的产品类别上。

优达(Udacity)customer_segments

我与影子孤独终老i 提交于 2019-12-06 15:04:50
github地址 机器学习纳米学位 非监督学习 项目 3: 创建用户分类 欢迎来到机器学习工程师纳米学位的第三个项目!在这个notebook文件中,有些模板代码已经提供给你,但你还需要实现更多的功能来完成这个项目。除非有明确要求,你无须修改任何已给出的代码。以 ‘练习’ 开始的标题表示接下来的代码部分中有你必须要实现的功能。每一部分都会有详细的指导,需要实现的部分也会在注释中以 ‘TODO’ 标出。请仔细阅读所有的提示! 除了实现代码外,你还 必须 回答一些与项目和你的实现有关的问题。每一个需要你回答的问题都会以 ‘问题 X’ 为标题。请仔细阅读每个问题,并且在问题后的 ‘回答’ 文字框中写出完整的答案。我们将根据你对问题的回答和撰写代码所实现的功能来对你提交的项目进行评分。 提示:**Code 和 Markdown 区域可通过 **Shift + Enter 快捷键运行。此外,Markdown可以通过双击进入编辑模式。 开始 在这个项目中,你将分析一个数据集的内在结构,这个数据集包含很多客户真对不同类型产品的年度采购额(用 金额 表示)。这个项目的任务之一是如何最好地描述一个批发商不同种类顾客之间的差异。这样做将能够使得批发商能够更好的组织他们的物流服务以满足每个客户的需求。 这个项目的数据集能够在 UCI机器学习信息库 中找到.因为这个项目的目的,分析将不会包括’Channel

Principal component analysis with EQUAMAX rotation in R

喜欢而已 提交于 2019-12-06 13:15:06
问题 I need to do a principal component analysis ( PCA ) with EQUAMAX-rotation in R . Unfortunately the function "principal()" I use normally for PCA does not offer this kind of rotation. I could find out that it may be possible somehow with the package GPArotation but I could not yet figure out how to use this in the PCA . Maybe someone can give an example on how to do an equamax-rotation PCA ? Or is there a function for PCA in another package that offers the use of equamax-rotation directly?

C++ - framework for computing PCA (other than armadillo)

守給你的承諾、 提交于 2019-12-06 12:11:47
I have a large dataset of around 200000 data points where each data point contains 132 features. So basically my dataset is 200000 x 132 . I have done all the computations by using the armadillo framework . However, I have tried to do PCA analysis but I received a memory error which I don't know that it's because of my RAM memory( 8 GB of Ram ) or its a limitation due to the framework itself. I receive the following error : requested size is too large . Can you recommend me another framework for PCA computation which doesn't have size/memory limtations? Or if you have previously used armadillo

Dimensionality reduction in HOG feature vector

谁都会走 提交于 2019-12-06 11:20:59
I found out the HOG feature vector of the following image in MATLAB. Input Image I used the following code. I = imread('input.jpg'); I = rgb2gray(I); [features, visualization] = extractHOGFeatures(I,'CellSize',[16 16]); features comes out to be a 1x1944 vector and I need to reduce the dimensionality of this vector (say to 1x100 ), what method should I employ for the same? I thought of Principal Component Analysis and ran the following in MATLAB. prinvec = pca(features); prinvec comes out to be an empty matrix ( 1944x0 ). Am I doing it wrong? If not PCA, what other methods can I use to reduce

scikits-learn pca dimension reduction issue

 ̄綄美尐妖づ 提交于 2019-12-06 08:45:24
问题 I have a problem with reduction dimension using scikit-learn and PCA. I have two numpy matrices, one has size (1050,4096) and another has size (50,4096). I tried to reduce the dimensions of both to yield (1050, 399) and (50,399) but, after doing the pca I got (1050,399) and (50,50) matrices. One matrix is for knn training and another for knn test. What's wrong with my code below? pca = decomposition.PCA() pca.fit(train) pca.n_components = 399 train_reduced = pca.fit_transform(train) pca.n

How to Use PCA to Reduce Dimension

倖福魔咒の 提交于 2019-12-06 08:23:03
Input : LBP Feature extracted from an image with dimension 75520, so the input LBP data contains 1 row and 75520 columns. Required Output: Apply PCA on input to reduce the dimension, Currently my code look like, void PCA_DimensionReduction(Mat &src, Mat &dst){ int PCA_DIMENSON_VAL 40 Mat tmp = src.reshape(1,1); //1 rows X 75520 cols Mat projection_result; Mat input_feature_vector; Mat norm_tmp; normalize(tmp,input_feature_vector,0,1,NORM_MINMAX,CV_32FC1); PCA pca(input_feature_vector,Mat(),CV_PCA_DATA_AS_ROW, PCA_DIMENSON_VAL); pca.project(input_feature_vector,projection_result); dst =

Bug in Scikit-Learn PCA or in Numpy Eigen Decomposition?

浪尽此生 提交于 2019-12-06 08:06:11
I have a dataset with 400 features. What I did: # approach 1 d_cov = np.cov(d_train.transpose()) eigens, mypca = LA.eig(d_cov) # assume sort by eigen value also/ LA = numpy linear algebra # approach 2 pca = PCA(n_components=300) d_fit = pca.fit_transform(d_train) pc = pca.components_ Now, these two should be the same, right? as PCA is just the eigendecomposition of the covariance matrix. But these are very different in my case? How could that be, I am doing any mistake above? Comparing variances: import numpy as np LA = np.linalg d_train = np.random.randn(100, 10) d_cov = np.cov(d_train

subset of prcomp object in R

廉价感情. 提交于 2019-12-06 07:42:23
I'm basically computing the PCA for a set of variables and everything works fine. Lets say I'm using the iris data as an example, but my data is different. The iris data should be sufficient to explain my question: data(iris) log.ir <- log(iris[, 1:4]) log.ir[mapply(is.infinite, log.ir)] <- 0 ir.groups<- iris[, 5] ir.pca <- prcomp(log.ir, center = TRUE, scale. = TRUE) library(ggbiplot) g <- ggbiplot(ir.pca, obs.scale = 1,var.scale = 1,groups = ir.groups, var.axes=F) g <- g + scale_color_discrete(name = '') g <- g + theme(legend.direction = 'horizontal', legend.position = 'top') + theme(legend