scikit-learn

sklearn increasing number of jobs leads to slow training

雨燕双飞 提交于 2021-01-27 08:23:31
问题 I've been trying to get sklearn to use more cpu cores during gridsearch (doing this on a Windows machine). Code is this: parameters = {'n_estimators': numpy.arange(1,10), 'max_depth':numpy.arange(1,10)} estimator = RandomForestClassifier(verbose=1) clf = grid_search.GridSearchCV(estimator, parameters, n_jobs=-1) clf.fit(features_train, labels_train) I'm testing this on a small dataset of only 100 samples. When n_jobs is set to 1 (default), everything proceeds as normal and finishes quickly.

sklearn increasing number of jobs leads to slow training

无人久伴 提交于 2021-01-27 08:23:31
问题 I've been trying to get sklearn to use more cpu cores during gridsearch (doing this on a Windows machine). Code is this: parameters = {'n_estimators': numpy.arange(1,10), 'max_depth':numpy.arange(1,10)} estimator = RandomForestClassifier(verbose=1) clf = grid_search.GridSearchCV(estimator, parameters, n_jobs=-1) clf.fit(features_train, labels_train) I'm testing this on a small dataset of only 100 samples. When n_jobs is set to 1 (default), everything proceeds as normal and finishes quickly.

ValueError: Axes instance argument was not found in a figure

梦想的初衷 提交于 2021-01-27 07:22:37
问题 I am studying scikit-learn with 'Learning scikit-learn: Machine Learning in Python by Raúl Garreta'. In jupyter Notebook, from code In[1] to In[7] it works. But In[8] code does not work. Which is wrong? # In[1]: from sklearn import datasets iris = datasets.load_iris() X_iris, y_iris = iris.data, iris.target print X_iris.shape, y_iris.shape # In[2]: from sklearn.cross_validation import train_test_split from sklearn import preprocessing X, y = X_iris[:, :2], y_iris X_train, X_test, y_train, y

Finding a corresponding leaf node for each data point in a decision tree (scikit-learn)

拜拜、爱过 提交于 2021-01-27 07:22:19
问题 I'm using decision tree classifier from the scikit-learn package in python 3.4, and I want to get the corresponding leaf node id for each of my input data point. For example, my input might look like this: array([[ 5.1, 3.5, 1.4, 0.2], [ 4.9, 3. , 1.4, 0.2], [ 4.7, 3.2, 1.3, 0.2]]) and let's suppose the corresponding leaf nodes are 16, 5 and 45 respectively. I want my output to be: leaf_node_id = array([16, 5, 45]) I have read through the scikit-learn mailing list and related questions on SF

ValueError: Axes instance argument was not found in a figure

旧巷老猫 提交于 2021-01-27 07:20:15
问题 I am studying scikit-learn with 'Learning scikit-learn: Machine Learning in Python by Raúl Garreta'. In jupyter Notebook, from code In[1] to In[7] it works. But In[8] code does not work. Which is wrong? # In[1]: from sklearn import datasets iris = datasets.load_iris() X_iris, y_iris = iris.data, iris.target print X_iris.shape, y_iris.shape # In[2]: from sklearn.cross_validation import train_test_split from sklearn import preprocessing X, y = X_iris[:, :2], y_iris X_train, X_test, y_train, y

How to compare if two sklearn estimators are equals?

社会主义新天地 提交于 2021-01-27 06:35:31
问题 I have two sklearn estimators and want to compare them: import numpy as np from sklearn.tree import DecisionTreeClassifier X, y = np.random.random((100,2)), np.random.choice(2,100) dt1 = DecisionTreeClassifier() dt1.fit(X, y) dt2 = DecisionTreeClassifier() dt3 = sklearn.base.copy.deepcopy(dt1) How can I compare classifiers so that dt1 != dt2, dt1 == dt3? 回答1: You will want to compare the params assigned to the classifier instance and the .tree_.value of the trained classifiers: # the trees

Scikit Learn HMM training with set of observation sequences

强颜欢笑 提交于 2021-01-27 06:13:46
问题 I had a question about how I can use gaussianHMM in the scikit-learn package to train on several different observation sequences all at once. The example is here: visualizing the stock market structure shows EM converging on 1 long observation sequence. But in many scenarios, we want to break up the observations (like training on set of sentences) with each observation sequence having a START and END state. That is, I would like to globally train on multiple observation sequences. How can one

Scikit Learn HMM training with set of observation sequences

丶灬走出姿态 提交于 2021-01-27 06:13:37
问题 I had a question about how I can use gaussianHMM in the scikit-learn package to train on several different observation sequences all at once. The example is here: visualizing the stock market structure shows EM converging on 1 long observation sequence. But in many scenarios, we want to break up the observations (like training on set of sentences) with each observation sequence having a START and END state. That is, I would like to globally train on multiple observation sequences. How can one

Pandas 'Passing list-likes to .loc or [] with any missing labels is no longer supported' on train_test_split returned data

只愿长相守 提交于 2021-01-27 02:26:15
问题 For some reason train_test_split, despite lengths being identical and indexes look the same, triggers this error. from sklearn.model_selection import KFold data = {'col1':[30.5,45,1,99,6,5,4,2,5,7,7,3], 'col2':[99.5, 98, 95, 90,1,5,6,7,4,4,3,3],'col3':[23, 23.6, 3, 90,1,9,60,9,7,2,2,1]} df = pd.DataFrame(data) train, test = train_test_split(df, test_size=0.10) X = train[['col1', 'col2']] y2 = train['col3'] X = np.array(X) kf = KFold(n_splits=3, shuffle=True) for train_index, test_index in kf

Consistent ColumnTransformer for intersecting lists of columns

ε祈祈猫儿з 提交于 2021-01-24 08:17:31
问题 I want to use sklearn.compose.ColumnTransformer consistently (not parallel, so, the second transformer should be executed only after the first) for intersecting lists of columns in this way: log_transformer = p.FunctionTransformer(lambda x: np.log(x)) df = pd.DataFrame({'a': [1,2, np.NaN, 4], 'b': [1,np.NaN, 3, 4], 'c': [1 ,2, 3, 4]}) compose.ColumnTransformer(n_jobs=1, transformers=[ ('num', impute.SimpleImputer() , ['a', 'b']), ('log', log_transformer, ['b', 'c']), ('scale', p