sklearn-pandas

How to predict correctly in sklearn RandomForestRegressor?

天大地大妈咪最大 提交于 2020-01-06 04:54:06
问题 I'm working on a big data project for my school project. My dataset looks like this: https://github.com/gindeleo/climate/blob/master/GlobalTemperatures.csv I'm trying to predict the next values of "LandAverageTemperature". First, I've imported the csv into pandas and made it DataFrame named "df1". After taking errors on my first tries in sklearn, I converted the "dt" column into datetime64 from string then added a column named "year" that shows only the years in the date values.-Its probably

Found input variables with inconsistent numbers of samples

橙三吉。 提交于 2020-01-05 04:23:11
问题 It looks like I have some inconsistency, but I am not able to troubleshoot. def train(classifier, X, y): X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=33) print(X_train.shape) print(y_train.shape) classifier.fit(X_train, y_train) print ("Accuracy: %s" % classifier.score(X_test, y_test)) return classifier def load_data_and_labels(filename): """Load sentences and labels""" df = pd.read_csv(filename, compression='zip') columns = df.columns.tolist()

ImportError: cannot import name 'LatentDirichletAllocation'

↘锁芯ラ 提交于 2020-01-05 04:13:06
问题 I'm trying to import the following: from sklearn.model_selection import train_test_split and got following error, here's the stack trace : ImportError Traceback (most recent call last) <ipython-input-1-bdd2a2f20673> in <module> 2 import pandas as pd 3 from sklearn.model_selection import train_test_split ----> 4 from sklearn.tree import DecisionTreeClassifier 5 from sklearn.metrics import accuracy_score 6 from sklearn import tree ~/.local/lib/python3.6/site-packages/sklearn/tree/__init__.py in

Reverse the Multi label binarizer in pandas

眉间皱痕 提交于 2020-01-01 22:28:10
问题 I have pandas dataframe as import pandas as pd from sklearn.preprocessing import MultiLabelBinarizer mlb = MultiLabelBinarizer() # load sample data df = pd.DataFrame( {'user_id':['1','1','2','2','2','3'], 'fruits':['banana','orange','orange','apple','banana','mango']}) I collect all the fruits for each user using below code - # collect fruits for each user transformed_df= df.groupby('user_id').agg({'fruits':lambda x: list(x)}).reset_index() print(transformed_df) user_id fruits 0 1 [banana,

Number of features of the model must match the input

旧时模样 提交于 2019-12-25 07:16:36
问题 For some reason the features of this dataset is being interpreted as rows, "Model n_features is 16 and input n_features is 18189" Where 18189 is the number of rows and 16 is the correct feature list. The suspect code is here: for var in cat_cols: num = LabelEncoder() train[var] = num.fit_transform(train[var].astype('str')) train['output'] = num.fit_transform(train['output'].astype('str')) for var in cat_cols: num = LabelEncoder() test[var] = num.fit_transform(test[var].astype('str')) test[

I keep getting AttributeError in RandomSearchCV

痴心易碎 提交于 2019-12-25 02:13:48
问题 x_tu = data_cls_tu.iloc[:,1:].values y_tu = data_cls_tu.iloc[:,0].values classifier = DecisionTreeClassifier() parameters = [{"max_depth": [3,None], "min_samples_leaf": np.random.randint(1,9), "criterion": ["gini","entropy"]}] randomcv = RandomizedSearchCV(estimator=classifier, param_distributions=parameters, scoring='accuracy', cv=10, n_jobs=-1, random_state=0) randomcv.fit(x_tu, y_tu) --------------------------------------------------------------------------- AttributeError Traceback (most

Can't find private function for sklearn (LocalOutlierFactor) in reticulate

ぃ、小莉子 提交于 2019-12-24 12:03:17
问题 I tried to add a part of a python code to my R Script. Unfortunately it seems that I can't use a private function for the LocalOutlierFactor in R: # Sample Data n <- 5000 n_outlier <- .05 * n set.seed(11212) inlier <- mvtnorm::rmvnorm(n, mean = c(0,0)) outlier <- mvtnorm::rmvnorm(n_outlier, mean = c(20, 20)) testdata <- rbind(inlier, outlier) smp_size <- floor(0.5 * nrow(testdata)) train_ind <- sample(seq_len(nrow(testdata)), size = smp_size) train_lof <-as.data.frame(testdata[train_ind, ])

Extract DataFrame from a list of indices of another DataFrame

假装没事ソ 提交于 2019-12-24 09:17:23
问题 I've a DataFrame "A" and a list of indices "I". I want to generate/get a DataFrame "B" which contains only the data in those indices "I" of the original DataFrame "A". How can I achieve this? Assuming I = [1, 3] , I tried this A.filter(items=I, axis=0) is this the right way, or is there an even better way to do it. 回答1: I think to need DataFrame.loc: A = pd.DataFrame({ 'A': ['a','a','a','a','b','b','b','c','d'], 'B': list(range(9)) }) print (A) A B 0 a 0 1 a 1 2 a 2 3 a 3 4 b 4 5 b 5 6 b 6 7

scikit-learn : ValueError: not enough values to unpack (expected 2, got 1)

心已入冬 提交于 2019-12-23 17:55:44
问题 There is a check_array function for calculating mean absolute percentage error (MAPE) in the recent version of sklearn but it doesn't seem to work the same way as the previous version. import numpy as np from sklearn.utils import check_array def calculate_mape(y_true, y_pred): y_true, y_pred = check_array(y_true, y_pred) return np.mean(np.abs((y_true - y_pred) / y_true)) * 100 y_true = [3, -0.5, 2, 7]; y_pred = [2.5, -0.3, 2, 8] calculate_mape(y_true, y_pred) This is returning an error:

iPython (python 2) - ImportError: No module named model_selection

无人久伴 提交于 2019-12-23 15:38:21
问题 iPython Notebook Python 2 Complaining about this line: from sklearn.model_selection import train_test_split Why isn't model selection working? 回答1: In order to remedy this issue, you need to first find out if you are importing the actual sklearn package, and not just some script with the name sklearn.py saved somewhere in your working directory. The way Python imports modules is somewhat similar to the way it finds variables in its namespace ( Local , Enclosed , Global , Built-in ). In this