scikit-learn

Python scikit learn MLPClassifier “hidden_layer_sizes”

做~自己de王妃 提交于 2020-12-27 08:05:24
问题 I am lost in the scikit learn 0.18 user manual (http://scikit-learn.org/dev/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier): hidden_layer_sizes : tuple, length = n_layers - 2, default (100,) The ith element represents the number of neurons in the ith hidden layer. If I am looking for only 1 hidden layer and 7 hidden units in my model, should I put like this? Thanks! hidden_layer_sizes=(7, 1) 回答1: hidden_layer_sizes=(7,) if you want only 1

Sklearn Pipeline: Get feature names after OneHotEncode In ColumnTransformer

↘锁芯ラ 提交于 2020-12-27 07:39:15
问题 I want to get feature names after I fit the pipeline. categorical_features = ['brand', 'category_name', 'sub_category'] categorical_transformer = Pipeline(steps=[ ('imputer', SimpleImputer(strategy='constant', fill_value='missing')), ('onehot', OneHotEncoder(handle_unknown='ignore'))]) numeric_features = ['num1', 'num2', 'num3', 'num4'] numeric_transformer = Pipeline(steps=[ ('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler())]) preprocessor = ColumnTransformer(

Encoding text in ML classifier

不羁的心 提交于 2020-12-25 10:54:45
问题 I am trying to build a ML model. However I am having difficulties in understanding where to apply the encoding. Please see below the steps and functions to replicate the process I have been following. First I split the dataset into train and test: # Import the resampling package from sklearn.naive_bayes import MultinomialNB import string from nltk.corpus import stopwords import re from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import CountVectorizer

Can we make the ML model (pickle file) more robust, by accepting (or ignoring) new features?

ⅰ亾dé卋堺 提交于 2020-12-25 10:26:58
问题 I have trained a ML model, and stored it into a Pickle file. In my new script, I am reading new 'real world data', on which I want to do a prediction. However, I am struggling. I have a column (containing string values), like: Sex Male Female # This is just as example, in real it is having much more unique values Now comes the issue. I received a new (unique) value, and now I cannot make predictions anymore (e.g. 'Neutral' was added). Since I am transforming the 'Sex' column into Dummies, I

Can we make the ML model (pickle file) more robust, by accepting (or ignoring) new features?

妖精的绣舞 提交于 2020-12-25 10:20:27
问题 I have trained a ML model, and stored it into a Pickle file. In my new script, I am reading new 'real world data', on which I want to do a prediction. However, I am struggling. I have a column (containing string values), like: Sex Male Female # This is just as example, in real it is having much more unique values Now comes the issue. I received a new (unique) value, and now I cannot make predictions anymore (e.g. 'Neutral' was added). Since I am transforming the 'Sex' column into Dummies, I

Can we make the ML model (pickle file) more robust, by accepting (or ignoring) new features?

半城伤御伤魂 提交于 2020-12-25 10:17:05
问题 I have trained a ML model, and stored it into a Pickle file. In my new script, I am reading new 'real world data', on which I want to do a prediction. However, I am struggling. I have a column (containing string values), like: Sex Male Female # This is just as example, in real it is having much more unique values Now comes the issue. I received a new (unique) value, and now I cannot make predictions anymore (e.g. 'Neutral' was added). Since I am transforming the 'Sex' column into Dummies, I

Testing text classification ML model with new data fails

旧时模样 提交于 2020-12-23 18:06:03
问题 I have built a machine learning model to classify emails as spams or not. Now i want to test my own email and see the result. So i wrote the following code to classify the new email: message = """Subject: Hello this is from google security team we want to recover your password. Please contact us as soon as possible""" message = pd.Series([message,]) transformed_message = CountVectorizer(analyzer=process_text).fit_transform(message) proba = model.predict_proba(transformed_message)[0] Knowing

Testing text classification ML model with new data fails

北城以北 提交于 2020-12-23 17:59:31
问题 I have built a machine learning model to classify emails as spams or not. Now i want to test my own email and see the result. So i wrote the following code to classify the new email: message = """Subject: Hello this is from google security team we want to recover your password. Please contact us as soon as possible""" message = pd.Series([message,]) transformed_message = CountVectorizer(analyzer=process_text).fit_transform(message) proba = model.predict_proba(transformed_message)[0] Knowing

Testing text classification ML model with new data fails

落爺英雄遲暮 提交于 2020-12-23 17:57:54
问题 I have built a machine learning model to classify emails as spams or not. Now i want to test my own email and see the result. So i wrote the following code to classify the new email: message = """Subject: Hello this is from google security team we want to recover your password. Please contact us as soon as possible""" message = pd.Series([message,]) transformed_message = CountVectorizer(analyzer=process_text).fit_transform(message) proba = model.predict_proba(transformed_message)[0] Knowing

Testing text classification ML model with new data fails

守給你的承諾、 提交于 2020-12-23 17:53:50
问题 I have built a machine learning model to classify emails as spams or not. Now i want to test my own email and see the result. So i wrote the following code to classify the new email: message = """Subject: Hello this is from google security team we want to recover your password. Please contact us as soon as possible""" message = pd.Series([message,]) transformed_message = CountVectorizer(analyzer=process_text).fit_transform(message) proba = model.predict_proba(transformed_message)[0] Knowing