scikit-learn | 易学教程

predicitng new value through a model trained on one hot encoded data

阅读更多关于 predicitng new value through a model trained on one hot encoded data

问题 This might look like a trivial problem. But I am getting stuck in predicting results from a model. My problem is like this: I have a dataset of shape 1000 x 19 (except target feature) but after one hot encoding it becomes 1000 x 141. Since I trained the model on the data which is of shape 1000 x 141, so I need data of shape 1 x 141 (at least) for prediction. I also know in python, I can make future prediction using model.predict(data) But, since I am getting data from an end user through a

predicitng new value through a model trained on one hot encoded data

阅读更多关于 predicitng new value through a model trained on one hot encoded data

predicitng new value through a model trained on one hot encoded data

阅读更多关于 predicitng new value through a model trained on one hot encoded data

Object of type 'ndarray' is not JSON serializable

阅读更多关于 Object of type 'ndarray' is not JSON serializable

问题 I am new to python and machine learning. I have a Linear Regression model which is able to predict output based on the input which I have dumped to be used with a web service. See the code below: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25) regression_model = LinearRegression() regression_model.fit(X_train, y_train) print(regression_model.predict(np.array([[21, 0, 0, 0, 1, 0, 0, 1, 1, 1]]))) # this is returning my expected output joblib.dump(regression_model, '..

Why does the standardscaler have different effects under different number of features

阅读更多关于 Why does the standardscaler have different effects under different number of features

问题 I experimented with breast cancer data from scikit-learn. Use all features and not use standardscaler: cancer = datasets.load_breast_cancer() x = cancer.data y = cancer.target x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42) pla = Perceptron().fit(x_train, y_train) y_pred = pla.predict(x_test) print(accuracy_score(y_test, y_pred)) result 1 : 0.9473684210526315 Use all features and use standardscaler: cancer = datasets.load_breast_cancer() x = cancer

random_state parameter in classification models

阅读更多关于 random_state parameter in classification models

问题 Can someone explain why does the random_state parameter affects the model so much? I have a RandomForestClassifier model and want to set the random_state (for reproducibility pourpouses), but depending on the value I use I get very different values on my overall evaluation metric (F1 score) For example, I tried to fit the same model with 100 different random_state values and after the training ad testing the smallest F1 was 0.64516129 and the largest 0.808823529). That is a huge difference.

Decision Tree Sklearn -Depth Of tree and accuracy

阅读更多关于 Decision Tree Sklearn -Depth Of tree and accuracy

问题 I am applying Decision Tree to a data set, using sklearn In Sklearn there is a parameter to select the depth of the tree - dtree = DecisionTreeClassifier(max_depth=10). My question is how the max_depth parameter helps on the model. how does high/low max_depth help in predicting the test data more accurately? 回答1: max_depth is what the name suggests: The maximum depth that you allow the tree to grow to. The deeper you allow, the more complex your model will become. For training error, it is

How to remove stop phrases/stop ngrams (multi-word strings) using pandas/sklearn?

阅读更多关于 How to remove stop phrases/stop ngrams (multi-word strings) using pandas/sklearn?

问题 I want to prevent certain phrases for creeping into my models. For example, I want to prevent 'red roses' from entering into my analysis. I understand how to add individual stop words as given in Adding words to scikit-learn's CountVectorizer's stop list by doing so: from sklearn.feature_extraction import text additional_stop_words=['red','roses'] However, this also results in other ngrams like 'red tulips' or 'blue roses' not being detected. I am building a TfidfVectorizer as part of my

How to remove stop phrases/stop ngrams (multi-word strings) using pandas/sklearn?

阅读更多关于 How to remove stop phrases/stop ngrams (multi-word strings) using pandas/sklearn?

How to remove stop phrases/stop ngrams (multi-word strings) using pandas/sklearn?

阅读更多关于 How to remove stop phrases/stop ngrams (multi-word strings) using pandas/sklearn?