sklearn-pandas

Unexpected StandardScaler fit_transform output

假装没事ソ 提交于 2019-12-12 01:23:45
问题 I am trying to scale a pandas Series with StandardScaler().fit_transform(). However, the output is always an array of zeros. The input Series has a length of 201, when I do: print values[:5] I get a list of floats as below: 0 1943.0 1 508.0 2 1657.0 3 872.0 4 693.0 When I apply the scaler: X = preprocessing.StandardScaler().fit_transform(values) print X Output: [[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0

get_dummies not working properly in python

只愿长相守 提交于 2019-12-11 16:44:57
问题 i dont know why but im getting this error ? GetDummies is removing one column for unknown reason. I want both 'train' and 'test' data to have same no of columns. data = pd.read_csv('data/trainData.csv') train , test = train_test_split(data , test_size= 0.20 ) train = pd.get_dummies(train , columns =['job','marital','education','default','housing','loan','contact','month','day_of_week','poutcome'] , drop_first = True) c = DecisionTreeClassifier(min_samples_split=550) test = pd.get_dummies(test

TypeError: can only perform ops with scalar values

一曲冷凌霜 提交于 2019-12-11 11:59:15
问题 I would appreciate if you could let me know how to plot some informative charts for the table provided here. For example, I need a bar chart for the column named "Domestic unlisted companies:Use of IFRSs by unlisted companies" which shows in how many jurisdictions IFRSs is permitted, not permitted and so on. Or, a bar chart for the "Audit report states compliance with IFRS" column. Besides, I need a pie chart for the column with the title "Domestic listed companies" which shows in how many

Making Random Forest outputs like Logistic Regression

拥有回忆 提交于 2019-12-11 10:13:26
问题 I am asking dimensional wise etc. I am trying to implement this amazing work with random forest https://www.kaggle.com/allunia/how-to-attack-a-machine-learning-model/notebook Both logistic regression and random forest are from sklearn but when I get weights from random forest model its (784,) while the logistic regression returns (10,784) My most problems are mainly dimension and NaN, infinity or a value too large for dtype errors with attack methods. The weights using logical regression is

Array inside list

本秂侑毒 提交于 2019-12-11 08:02:51
问题 I'm really confused trying to solve this problem. I'm trying to use the sklearn function: MinMaxScaler but I'm getting an error because it seems to be that I'm setting an array element with a sequence. The code is: raw_values = series.values # transform data to be stationary diff_series = difference(raw_values, 1); diff_values = diff_series.values; diff_values = diff_values.reshape(len(diff_values), 1) # rescale values to 0,1 scaler = MinMaxScaler(feature_range=(0, 1)) scaled_values = scaler

Is numerical encoding necessary for the target variable in classification?

核能气质少年 提交于 2019-12-11 06:22:44
问题 I am using sklearn for text classification, all my features are numerical but my target variable labels are in text. I can understand the rationale behind encoding features to numerics but don't think this applies for the target variable? 回答1: If your target variable is in textual form, you can transform it into numeric form (or you can leave it alone, please see my note below) in order for any Scikit-learn algorithm to pick it in an OVA (One Versus All) scheme: your learning algorithm will

How to encode a pandas.DataFrame column containing lists using Sklearn.preprocessing

本小妞迷上赌 提交于 2019-12-11 05:55:52
问题 I have a pandas df and some of the columns are lists with data in them and I would like to encode the labels within the lists. I get this error: ValueError: Expected 2D array, got 1D array instead: from sklearn.preprocessing import OneHotEncoder mins = pd.read_csv('recipes.csv') enc = OneHotEncoder(handle_unknown='ignore') X = mins['Ingredients'] ''' [[lettuce, tomatoes, ginger, vodka, tomatoes] [lettuce, tomatoes, flour, vodka, tomatoes] ... [flour, tomatoes, vodka, vodka, mustard]] ''' enc

read_table in pandas, how to get input from text to a dataframe [duplicate]

好久不见. 提交于 2019-12-10 00:40:44
问题 This question already has answers here : Create Pandas DataFrame from txt file with specific pattern (5 answers) Closed 2 years ago . Alabama[edit] Auburn (Auburn University)[1] Florence (University of North Alabama) Jacksonville (Jacksonville State University)[2] Alaska[edit] Fairbanks (University of Alaska Fairbanks)[2] Arizona[edit] Flagstaff (Northern Arizona University)[6] Tempe (Arizona State University) Tucson (University of Arizona) This is my text, i need to create a data frame with

TypeError: unhashable type

China☆狼群 提交于 2019-12-08 15:57:48
问题 I wrote a small piece of code to do linear regression using sklearn. I created a 2 column csv file (column names X,Y with some numbers) and when I read the file I see that the content is properly read - as shown below. However, I am getting "unhashable type" error when I try to refer to a column using the commands datafile[:,:] or datafile[:,-1] etc.. And when I try to use X as response, Y as predictor in sklearn's linear regression, I am getting Value error as shown below. I looked online

Sklearn SVM: SVR and SVC, getting the same prediction for every input

余生长醉 提交于 2019-12-08 15:26:14
问题 Here is a paste of the code: SVM sample code I checked out a couple of the other answers to this problem...and it seems like this specific iteration of the problem is a bit different. First off, my inputs are normalized, and I have five inputs per point. The values are all reasonably sized (healthy 0.5s and 0.7s etc--few near zero or near 1 numbers). I have about 70 x inputs corresponding to their 70 y inputs. The y inputs are also normalized (they are percentage changes of my function after