sklearn-pandas | 易学教程

Unexpected StandardScaler fit_transform output

阅读更多关于 Unexpected StandardScaler fit_transform output

问题 I am trying to scale a pandas Series with StandardScaler().fit_transform(). However, the output is always an array of zeros. The input Series has a length of 201, when I do: print values[:5] I get a list of floats as below: 0 1943.0 1 508.0 2 1657.0 3 872.0 4 693.0 When I apply the scaler: X = preprocessing.StandardScaler().fit_transform(values) print X Output: [[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0

get_dummies not working properly in python

阅读更多关于 get_dummies not working properly in python

问题 i dont know why but im getting this error ? GetDummies is removing one column for unknown reason. I want both 'train' and 'test' data to have same no of columns. data = pd.read_csv('data/trainData.csv') train , test = train_test_split(data , test_size= 0.20 ) train = pd.get_dummies(train , columns =['job','marital','education','default','housing','loan','contact','month','day_of_week','poutcome'] , drop_first = True) c = DecisionTreeClassifier(min_samples_split=550) test = pd.get_dummies(test

TypeError: can only perform ops with scalar values

阅读更多关于 TypeError: can only perform ops with scalar values

问题 I would appreciate if you could let me know how to plot some informative charts for the table provided here. For example, I need a bar chart for the column named "Domestic unlisted companies:Use of IFRSs by unlisted companies" which shows in how many jurisdictions IFRSs is permitted, not permitted and so on. Or, a bar chart for the "Audit report states compliance with IFRS" column. Besides, I need a pie chart for the column with the title "Domestic listed companies" which shows in how many

Making Random Forest outputs like Logistic Regression

阅读更多关于 Making Random Forest outputs like Logistic Regression

问题 I am asking dimensional wise etc. I am trying to implement this amazing work with random forest https://www.kaggle.com/allunia/how-to-attack-a-machine-learning-model/notebook Both logistic regression and random forest are from sklearn but when I get weights from random forest model its (784,) while the logistic regression returns (10,784) My most problems are mainly dimension and NaN, infinity or a value too large for dtype errors with attack methods. The weights using logical regression is

Array inside list

阅读更多关于 Array inside list

问题 I'm really confused trying to solve this problem. I'm trying to use the sklearn function: MinMaxScaler but I'm getting an error because it seems to be that I'm setting an array element with a sequence. The code is: raw_values = series.values # transform data to be stationary diff_series = difference(raw_values, 1); diff_values = diff_series.values; diff_values = diff_values.reshape(len(diff_values), 1) # rescale values to 0,1 scaler = MinMaxScaler(feature_range=(0, 1)) scaled_values = scaler

Is numerical encoding necessary for the target variable in classification?

阅读更多关于 Is numerical encoding necessary for the target variable in classification?

问题 I am using sklearn for text classification, all my features are numerical but my target variable labels are in text. I can understand the rationale behind encoding features to numerics but don't think this applies for the target variable? 回答1: If your target variable is in textual form, you can transform it into numeric form (or you can leave it alone, please see my note below) in order for any Scikit-learn algorithm to pick it in an OVA (One Versus All) scheme: your learning algorithm will

How to encode a pandas.DataFrame column containing lists using Sklearn.preprocessing

阅读更多关于 How to encode a pandas.DataFrame column containing lists using Sklearn.preprocessing

问题 I have a pandas df and some of the columns are lists with data in them and I would like to encode the labels within the lists. I get this error: ValueError: Expected 2D array, got 1D array instead: from sklearn.preprocessing import OneHotEncoder mins = pd.read_csv('recipes.csv') enc = OneHotEncoder(handle_unknown='ignore') X = mins['Ingredients'] ''' [[lettuce, tomatoes, ginger, vodka, tomatoes] [lettuce, tomatoes, flour, vodka, tomatoes] ... [flour, tomatoes, vodka, vodka, mustard]] ''' enc

read_table in pandas, how to get input from text to a dataframe [duplicate]

阅读更多关于 read_table in pandas, how to get input from text to a dataframe [duplicate]

问题 This question already has answers here : Create Pandas DataFrame from txt file with specific pattern (5 answers) Closed 2 years ago . Alabama[edit] Auburn (Auburn University)[1] Florence (University of North Alabama) Jacksonville (Jacksonville State University)[2] Alaska[edit] Fairbanks (University of Alaska Fairbanks)[2] Arizona[edit] Flagstaff (Northern Arizona University)[6] Tempe (Arizona State University) Tucson (University of Arizona) This is my text, i need to create a data frame with

TypeError: unhashable type

阅读更多关于 TypeError: unhashable type

问题 I wrote a small piece of code to do linear regression using sklearn. I created a 2 column csv file (column names X,Y with some numbers) and when I read the file I see that the content is properly read - as shown below. However, I am getting "unhashable type" error when I try to refer to a column using the commands datafile[:,:] or datafile[:,-1] etc.. And when I try to use X as response, Y as predictor in sklearn's linear regression, I am getting Value error as shown below. I looked online

Sklearn SVM: SVR and SVC, getting the same prediction for every input

阅读更多关于 Sklearn SVM: SVR and SVC, getting the same prediction for every input

问题 Here is a paste of the code: SVM sample code I checked out a couple of the other answers to this problem...and it seems like this specific iteration of the problem is a bit different. First off, my inputs are normalized, and I have five inputs per point. The values are all reasonably sized (healthy 0.5s and 0.7s etc--few near zero or near 1 numbers). I have about 70 x inputs corresponding to their 70 y inputs. The y inputs are also normalized (they are percentage changes of my function after