scikit-learn

Cannot understand with sklearn's PolynomialFeatures

风流意气都作罢 提交于 2020-12-27 18:49:22
问题 Need help in sklearn's Polynomial Features. It works quite well with one feature but whenever I add multiple features, it also outputs some values in the array besides the values raised to the power of the degrees. For ex: For this array, X=np.array([[230.1,37.8,69.2]]) when I try to X_poly=poly.fit_transform(X) It outputs [[ 1.00000000e+00 2.30100000e+02 3.78000000e+01 6.92000000e+01 5.29460100e+04 8.69778000e+03 1.59229200e+04 1.42884000e+03 2.61576000e+03 4.78864000e+03]] Here, what is 8

How to compute precision,recall and f1 score of an imbalanced dataset for K fold cross validation with 10 folds in python

拟墨画扇 提交于 2020-12-27 10:09:34
问题 I have an imbalanced dataset containing binary classification problem.I have built Random Forest Classifier and used k fold cross validation with 10 folds. kfold = model_selection.KFold(n_splits=10, random_state=42) model=RandomForestClassifier(n_estimators=50) I got the results of the 10 folds results = model_selection.cross_val_score(model,features,labels, cv=kfold) print results [ 0.60666667 0.60333333 0.52333333 0.73 0.75333333 0.72 0.7 0.73 0.83666667 0.88666667] I have calculated

How to compute precision,recall and f1 score of an imbalanced dataset for K fold cross validation with 10 folds in python

纵饮孤独 提交于 2020-12-27 10:09:12
问题 I have an imbalanced dataset containing binary classification problem.I have built Random Forest Classifier and used k fold cross validation with 10 folds. kfold = model_selection.KFold(n_splits=10, random_state=42) model=RandomForestClassifier(n_estimators=50) I got the results of the 10 folds results = model_selection.cross_val_score(model,features,labels, cv=kfold) print results [ 0.60666667 0.60333333 0.52333333 0.73 0.75333333 0.72 0.7 0.73 0.83666667 0.88666667] I have calculated

How to compute precision,recall and f1 score of an imbalanced dataset for K fold cross validation with 10 folds in python

牧云@^-^@ 提交于 2020-12-27 10:06:31
问题 I have an imbalanced dataset containing binary classification problem.I have built Random Forest Classifier and used k fold cross validation with 10 folds. kfold = model_selection.KFold(n_splits=10, random_state=42) model=RandomForestClassifier(n_estimators=50) I got the results of the 10 folds results = model_selection.cross_val_score(model,features,labels, cv=kfold) print results [ 0.60666667 0.60333333 0.52333333 0.73 0.75333333 0.72 0.7 0.73 0.83666667 0.88666667] I have calculated

GridSearch over MultiOutputRegressor?

 ̄綄美尐妖づ 提交于 2020-12-27 08:54:59
问题 Let's consider a multivariate regression problem (2 response variables: Latitude and Longitude). Currently, a few machine learning model implementations like Support Vector Regression sklearn.svm.SVR do not currently provide naive support of multivariate regression. For this reason, sklearn.multioutput.MultiOutputRegressor can be used. Example: from sklearn.multioutput import MultiOutputRegressor svr_multi = MultiOutputRegressor(SVR(),n_jobs=-1) #Fit the algorithm on the data svr_multi.fit(X

When scale the data, why the train dataset use 'fit' and 'transform', but the test dataset only use 'transform'?

元气小坏坏 提交于 2020-12-27 08:48:21
问题 When scale the data, why the train dataset use 'fit' and 'transform', but the test dataset only use 'transform'? SAMPLE_COUNT = 5000 TEST_COUNT = 20000 seed(0) sample = list() test_sample = list() for index, line in enumerate(open('covtype.data','rb')): if index < SAMPLE_COUNT: sample.append(line) else: r = randint(0,index) if r < SAMPLE_COUNT: sample[r] = line else: k = randint(0,index) if k < TEST_COUNT: if len(test_sample) < TEST_COUNT: test_sample.append(line) else: test_sample[k] = line

Visualise word2vec generated from gensim

≯℡__Kan透↙ 提交于 2020-12-27 08:20:30
问题 I have trained a doc2vec and corresponding word2vec on my own corpus using gensim. I want to visualise the word2vec using t-sne with the words. As in, each dot in the figure has the "word" also with it. I looked at a similar question here : t-sne on word2vec Following it, I have this code : import gensim import gensim.models as g from sklearn.manifold import TSNE import re import matplotlib.pyplot as plt modelPath="/Users/tarun/Desktop/PE/doc2vec/model3_100_newCorpus60_1min_6window

Visualise word2vec generated from gensim

。_饼干妹妹 提交于 2020-12-27 08:18:05
问题 I have trained a doc2vec and corresponding word2vec on my own corpus using gensim. I want to visualise the word2vec using t-sne with the words. As in, each dot in the figure has the "word" also with it. I looked at a similar question here : t-sne on word2vec Following it, I have this code : import gensim import gensim.models as g from sklearn.manifold import TSNE import re import matplotlib.pyplot as plt modelPath="/Users/tarun/Desktop/PE/doc2vec/model3_100_newCorpus60_1min_6window

Python scikit learn MLPClassifier “hidden_layer_sizes”

你说的曾经没有我的故事 提交于 2020-12-27 08:07:33
问题 I am lost in the scikit learn 0.18 user manual (http://scikit-learn.org/dev/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier): hidden_layer_sizes : tuple, length = n_layers - 2, default (100,) The ith element represents the number of neurons in the ith hidden layer. If I am looking for only 1 hidden layer and 7 hidden units in my model, should I put like this? Thanks! hidden_layer_sizes=(7, 1) 回答1: hidden_layer_sizes=(7,) if you want only 1

Python scikit learn MLPClassifier “hidden_layer_sizes”

别说谁变了你拦得住时间么 提交于 2020-12-27 08:05:47
问题 I am lost in the scikit learn 0.18 user manual (http://scikit-learn.org/dev/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier): hidden_layer_sizes : tuple, length = n_layers - 2, default (100,) The ith element represents the number of neurons in the ith hidden layer. If I am looking for only 1 hidden layer and 7 hidden units in my model, should I put like this? Thanks! hidden_layer_sizes=(7, 1) 回答1: hidden_layer_sizes=(7,) if you want only 1