scikit-learn | 易学教程

Python scikit learn MLPClassifier “hidden_layer_sizes”

阅读更多关于 Python scikit learn MLPClassifier “hidden_layer_sizes”

问题 I am lost in the scikit learn 0.18 user manual (http://scikit-learn.org/dev/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier): hidden_layer_sizes : tuple, length = n_layers - 2, default (100,) The ith element represents the number of neurons in the ith hidden layer. If I am looking for only 1 hidden layer and 7 hidden units in my model, should I put like this? Thanks! hidden_layer_sizes=(7, 1) 回答1: hidden_layer_sizes=(7,) if you want only 1

Sklearn Pipeline: Get feature names after OneHotEncode In ColumnTransformer

阅读更多关于 Sklearn Pipeline: Get feature names after OneHotEncode In ColumnTransformer

问题 I want to get feature names after I fit the pipeline. categorical_features = ['brand', 'category_name', 'sub_category'] categorical_transformer = Pipeline(steps=[ ('imputer', SimpleImputer(strategy='constant', fill_value='missing')), ('onehot', OneHotEncoder(handle_unknown='ignore'))]) numeric_features = ['num1', 'num2', 'num3', 'num4'] numeric_transformer = Pipeline(steps=[ ('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler())]) preprocessor = ColumnTransformer(

Encoding text in ML classifier

阅读更多关于 Encoding text in ML classifier

问题 I am trying to build a ML model. However I am having difficulties in understanding where to apply the encoding. Please see below the steps and functions to replicate the process I have been following. First I split the dataset into train and test: # Import the resampling package from sklearn.naive_bayes import MultinomialNB import string from nltk.corpus import stopwords import re from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import CountVectorizer

Can we make the ML model (pickle file) more robust, by accepting (or ignoring) new features?

阅读更多关于 Can we make the ML model (pickle file) more robust, by accepting (or ignoring) new features?

问题 I have trained a ML model, and stored it into a Pickle file. In my new script, I am reading new 'real world data', on which I want to do a prediction. However, I am struggling. I have a column (containing string values), like: Sex Male Female # This is just as example, in real it is having much more unique values Now comes the issue. I received a new (unique) value, and now I cannot make predictions anymore (e.g. 'Neutral' was added). Since I am transforming the 'Sex' column into Dummies, I

Can we make the ML model (pickle file) more robust, by accepting (or ignoring) new features?

阅读更多关于 Can we make the ML model (pickle file) more robust, by accepting (or ignoring) new features?

Can we make the ML model (pickle file) more robust, by accepting (or ignoring) new features?

阅读更多关于 Can we make the ML model (pickle file) more robust, by accepting (or ignoring) new features?

Testing text classification ML model with new data fails

阅读更多关于 Testing text classification ML model with new data fails

问题 I have built a machine learning model to classify emails as spams or not. Now i want to test my own email and see the result. So i wrote the following code to classify the new email: message = """Subject: Hello this is from google security team we want to recover your password. Please contact us as soon as possible""" message = pd.Series([message,]) transformed_message = CountVectorizer(analyzer=process_text).fit_transform(message) proba = model.predict_proba(transformed_message)[0] Knowing

Testing text classification ML model with new data fails

阅读更多关于 Testing text classification ML model with new data fails

Testing text classification ML model with new data fails

阅读更多关于 Testing text classification ML model with new data fails

Testing text classification ML model with new data fails

阅读更多关于 Testing text classification ML model with new data fails