sklearn-pandas

sklearn TimeSeriesSplit Error: KeyError: '[ 0 1 2 …] not in index'

点点圈 提交于 2019-12-08 09:50:55
问题 I want to use TimeSeriesSplit from sklearn on the following dataframe to predict sum: So to prepare X and y I do the following: X = df.drop(['sum'],axis=1) y = df['sum'] and then feed these two to: for train_index, test_index in tscv.split(X): X_train01, X_test01 = X[train_index], X[test_index] y_train01, y_test01 = y[train_index], y[test_index] by doing so, I get the following error: KeyError: '[ 0 1 2 ...] not in index' Here X is a dataframe, and apparently this cause the error, because if

how to convert Cassandra Map to Pandas Dataframe

╄→尐↘猪︶ㄣ 提交于 2019-12-08 03:22:05
问题 I want to read the data from cassandra column family of type map<string, int> and want to convert it to Pandas dataframe. Which further i want to use to train the model in python as mentioned here in classification of iris species. If, i would have used the csv to train the model. Then it would have looked like this: label, f1, f2, f3, f4, f5 0 , 11 , 1, 6 , 1, 2 1 , 5, 5, 1 , 2, 6 0 , 12, 9, 3 , 6, 8 0 , 9, 3, 8, 1, 0 Cassandra column family : FeatureSet | label {'f1': 11, 'f2': 1, 'f3': 6,

Extract rule path of data point through decision tree with sklearn python

こ雲淡風輕ζ 提交于 2019-12-08 00:38:56
问题 I'm using decision tree model and I want to extract the decision path for each data point in order to understand what caused the Y rather than to predict it. How can I do that? Couldn't find any documentation. 回答1: Here is an example using the iris dataset . from sklearn.datasets import load_iris from sklearn import tree import graphviz iris = load_iris() clf = tree.DecisionTreeClassifier() clf = clf.fit(iris.data, iris.target) dot_data = tree.export_graphviz(clf, out_file=None, feature_names

unorderable types error when importing sklearn

扶醉桌前 提交于 2019-12-06 06:23:06
问题 I installed numpy(1.12.0b1), Scipy(0.18) on windows. I also installed sci-kit as well. When i wrote "import sklearn" in python console, it gives an error like this: if np_version < (1, 12, 0): TypeError: unorderable types: str() < int() What will be the issue? 回答1: The problem is out on the version number, so maybe you could try to revise fixs.py in the sklearn folder. Add these script after the try in line 32: if not (x.isdigit()): x='0' so your codes will be: def _parse_version(version

difference between LinearRegression and svm.SVR(kernel=“linear”)

不打扰是莪最后的温柔 提交于 2019-12-06 03:45:49
问题 First there are questions on this forum very similar to this one but trust me none matches so no duplicating please. I have encountered two methods of linear regression using scikit's sklearn and I am failing to understand the difference between the two, especially where in first code there's a method train_test_split() called while in the other one directly fit method is called. I am studying with multiple resources and this single issue is very confusing to me. First which uses SVR X = np

Reverse Label Encoding giving error

99封情书 提交于 2019-12-06 02:47:51
I label encoded my categorical data into numerical data using label encoder data['Resi'] = LabelEncoder().fit_transform(data['Resi']) But I when I try to find how they are mapped internally using list(LabelEncoder.inverse_transform(data['Resi'])) I am getting below error TypeError Traceback (most recent call last) <ipython-input-67-419ab6db89e2> in <module>() ----> 1 list(LabelEncoder.inverse_transform(data['Resi'])) TypeError: inverse_transform() missing 1 required positional argument: 'y' How to fix this Sample data Resi IP IP IP IP IP IE IP IP IP IP IP IPD IE IE IP IE IP IP IP You can check

How to run non-linear regression in python

拜拜、爱过 提交于 2019-12-05 23:34:12
问题 i am having the following information(dataframe) in python product baskets scaling_factor 12345 475 95.5 12345 108 57.7 12345 2 1.4 12345 38 21.9 12345 320 88.8 and I want to run the following non-linear regression and estimate the parameters. a ,b and c Equation that i want to fit: scaling_factor = a - (b*np.exp(c*baskets)) In sas we usually run the following model:(uses gauss newton method ) proc nlin data=scaling_factors; parms a=100 b=100 c=-0.09; model scaling_factor = a - (b * (exp(c

read_table in pandas, how to get input from text to a dataframe [duplicate]

六眼飞鱼酱① 提交于 2019-12-04 20:36:03
This question already has an answer here: Create Pandas DataFrame from txt file with specific pattern 5 answers Alabama[edit] Auburn (Auburn University)[1] Florence (University of North Alabama) Jacksonville (Jacksonville State University)[2] Alaska[edit] Fairbanks (University of Alaska Fairbanks)[2] Arizona[edit] Flagstaff (Northern Arizona University)[6] Tempe (Arizona State University) Tucson (University of Arizona) This is my text, i need to create a data frame with 1 column for the state name, and another column for the town name, i know how to remove the university names. but how do i

Loading sklearn model in Java. Model created with DNNClassifier in python

99封情书 提交于 2019-12-04 12:26:14
问题 The goal is to open in Java a model created/trained in python with tensorflow.contrib.learn.learn.DNNClassifier . At the moment the main issue is to know the name of the "tensor" to give in java on the session runner method. I have this test code in python : from __future__ import division, print_function, absolute_import import tensorflow as tf import pandas as pd import tensorflow.contrib.learn as learn import numpy as np from sklearn import metrics from sklearn.cross_validation import

How to do Onehotencoding in Sklearn Pipeline

99封情书 提交于 2019-12-04 08:35:14
问题 I am trying to oneHotEncode the categorical variables of my Pandas dataframe, which includes both categorical and continues variables. I realise this can be done easily with the pandas .get_dummies() function, but I need to use a pipeline so I can generate a PMML-file later on. This is the code to create a mapper. The categorical variables I would like to encode are stored in a list called 'dummies'. from sklearn_pandas import DataFrameMapper from sklearn.preprocessing import OneHotEncoder