sklearn-pandas | 易学教程

How to predict correctly in sklearn RandomForestRegressor?

阅读更多关于 How to predict correctly in sklearn RandomForestRegressor?

问题 I'm working on a big data project for my school project. My dataset looks like this: https://github.com/gindeleo/climate/blob/master/GlobalTemperatures.csv I'm trying to predict the next values of "LandAverageTemperature". First, I've imported the csv into pandas and made it DataFrame named "df1". After taking errors on my first tries in sklearn, I converted the "dt" column into datetime64 from string then added a column named "year" that shows only the years in the date values.-Its probably

Found input variables with inconsistent numbers of samples

阅读更多关于 Found input variables with inconsistent numbers of samples

问题 It looks like I have some inconsistency, but I am not able to troubleshoot. def train(classifier, X, y): X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=33) print(X_train.shape) print(y_train.shape) classifier.fit(X_train, y_train) print ("Accuracy: %s" % classifier.score(X_test, y_test)) return classifier def load_data_and_labels(filename): """Load sentences and labels""" df = pd.read_csv(filename, compression='zip') columns = df.columns.tolist()

ImportError: cannot import name 'LatentDirichletAllocation'

阅读更多关于 ImportError: cannot import name 'LatentDirichletAllocation'

问题 I'm trying to import the following: from sklearn.model_selection import train_test_split and got following error, here's the stack trace : ImportError Traceback (most recent call last) <ipython-input-1-bdd2a2f20673> in <module> 2 import pandas as pd 3 from sklearn.model_selection import train_test_split ----> 4 from sklearn.tree import DecisionTreeClassifier 5 from sklearn.metrics import accuracy_score 6 from sklearn import tree ~/.local/lib/python3.6/site-packages/sklearn/tree/__init__.py in

Reverse the Multi label binarizer in pandas

阅读更多关于 Reverse the Multi label binarizer in pandas

问题 I have pandas dataframe as import pandas as pd from sklearn.preprocessing import MultiLabelBinarizer mlb = MultiLabelBinarizer() # load sample data df = pd.DataFrame( {'user_id':['1','1','2','2','2','3'], 'fruits':['banana','orange','orange','apple','banana','mango']}) I collect all the fruits for each user using below code - # collect fruits for each user transformed_df= df.groupby('user_id').agg({'fruits':lambda x: list(x)}).reset_index() print(transformed_df) user_id fruits 0 1 [banana,

Number of features of the model must match the input

阅读更多关于 Number of features of the model must match the input

问题 For some reason the features of this dataset is being interpreted as rows, "Model n_features is 16 and input n_features is 18189" Where 18189 is the number of rows and 16 is the correct feature list. The suspect code is here: for var in cat_cols: num = LabelEncoder() train[var] = num.fit_transform(train[var].astype('str')) train['output'] = num.fit_transform(train['output'].astype('str')) for var in cat_cols: num = LabelEncoder() test[var] = num.fit_transform(test[var].astype('str')) test[

I keep getting AttributeError in RandomSearchCV

阅读更多关于 I keep getting AttributeError in RandomSearchCV

问题 x_tu = data_cls_tu.iloc[:,1:].values y_tu = data_cls_tu.iloc[:,0].values classifier = DecisionTreeClassifier() parameters = [{"max_depth": [3,None], "min_samples_leaf": np.random.randint(1,9), "criterion": ["gini","entropy"]}] randomcv = RandomizedSearchCV(estimator=classifier, param_distributions=parameters, scoring='accuracy', cv=10, n_jobs=-1, random_state=0) randomcv.fit(x_tu, y_tu) --------------------------------------------------------------------------- AttributeError Traceback (most

Can't find private function for sklearn (LocalOutlierFactor) in reticulate

阅读更多关于 Can't find private function for sklearn (LocalOutlierFactor) in reticulate

问题 I tried to add a part of a python code to my R Script. Unfortunately it seems that I can't use a private function for the LocalOutlierFactor in R: # Sample Data n <- 5000 n_outlier <- .05 * n set.seed(11212) inlier <- mvtnorm::rmvnorm(n, mean = c(0,0)) outlier <- mvtnorm::rmvnorm(n_outlier, mean = c(20, 20)) testdata <- rbind(inlier, outlier) smp_size <- floor(0.5 * nrow(testdata)) train_ind <- sample(seq_len(nrow(testdata)), size = smp_size) train_lof <-as.data.frame(testdata[train_ind, ])

Extract DataFrame from a list of indices of another DataFrame

阅读更多关于 Extract DataFrame from a list of indices of another DataFrame

问题 I've a DataFrame "A" and a list of indices "I". I want to generate/get a DataFrame "B" which contains only the data in those indices "I" of the original DataFrame "A". How can I achieve this? Assuming I = [1, 3] , I tried this A.filter(items=I, axis=0) is this the right way, or is there an even better way to do it. 回答1: I think to need DataFrame.loc: A = pd.DataFrame({ 'A': ['a','a','a','a','b','b','b','c','d'], 'B': list(range(9)) }) print (A) A B 0 a 0 1 a 1 2 a 2 3 a 3 4 b 4 5 b 5 6 b 6 7

scikit-learn : ValueError: not enough values to unpack (expected 2, got 1)

阅读更多关于 scikit-learn : ValueError: not enough values to unpack (expected 2, got 1)

问题 There is a check_array function for calculating mean absolute percentage error (MAPE) in the recent version of sklearn but it doesn't seem to work the same way as the previous version. import numpy as np from sklearn.utils import check_array def calculate_mape(y_true, y_pred): y_true, y_pred = check_array(y_true, y_pred) return np.mean(np.abs((y_true - y_pred) / y_true)) * 100 y_true = [3, -0.5, 2, 7]; y_pred = [2.5, -0.3, 2, 8] calculate_mape(y_true, y_pred) This is returning an error:

iPython (python 2) - ImportError: No module named model_selection

阅读更多关于 iPython (python 2) - ImportError: No module named model_selection

问题 iPython Notebook Python 2 Complaining about this line: from sklearn.model_selection import train_test_split Why isn't model selection working? 回答1: In order to remedy this issue, you need to first find out if you are importing the actual sklearn package, and not just some script with the name sklearn.py saved somewhere in your working directory. The way Python imports modules is somewhat similar to the way it finds variables in its namespace ( Local , Enclosed , Global , Built-in ). In this