scikit-learn

predicitng new value through a model trained on one hot encoded data

人盡茶涼 提交于 2021-02-17 04:41:48
问题 This might look like a trivial problem. But I am getting stuck in predicting results from a model. My problem is like this: I have a dataset of shape 1000 x 19 (except target feature) but after one hot encoding it becomes 1000 x 141. Since I trained the model on the data which is of shape 1000 x 141, so I need data of shape 1 x 141 (at least) for prediction. I also know in python, I can make future prediction using model.predict(data) But, since I am getting data from an end user through a

predicitng new value through a model trained on one hot encoded data

余生长醉 提交于 2021-02-17 04:41:38
问题 This might look like a trivial problem. But I am getting stuck in predicting results from a model. My problem is like this: I have a dataset of shape 1000 x 19 (except target feature) but after one hot encoding it becomes 1000 x 141. Since I trained the model on the data which is of shape 1000 x 141, so I need data of shape 1 x 141 (at least) for prediction. I also know in python, I can make future prediction using model.predict(data) But, since I am getting data from an end user through a

predicitng new value through a model trained on one hot encoded data

删除回忆录丶 提交于 2021-02-17 04:41:25
问题 This might look like a trivial problem. But I am getting stuck in predicting results from a model. My problem is like this: I have a dataset of shape 1000 x 19 (except target feature) but after one hot encoding it becomes 1000 x 141. Since I trained the model on the data which is of shape 1000 x 141, so I need data of shape 1 x 141 (at least) for prediction. I also know in python, I can make future prediction using model.predict(data) But, since I am getting data from an end user through a

Object of type 'ndarray' is not JSON serializable

不羁岁月 提交于 2021-02-16 15:50:28
问题 I am new to python and machine learning. I have a Linear Regression model which is able to predict output based on the input which I have dumped to be used with a web service. See the code below: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25) regression_model = LinearRegression() regression_model.fit(X_train, y_train) print(regression_model.predict(np.array([[21, 0, 0, 0, 1, 0, 0, 1, 1, 1]]))) # this is returning my expected output joblib.dump(regression_model, '..

Why does the standardscaler have different effects under different number of features

冷暖自知 提交于 2021-02-16 15:16:38
问题 I experimented with breast cancer data from scikit-learn. Use all features and not use standardscaler: cancer = datasets.load_breast_cancer() x = cancer.data y = cancer.target x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42) pla = Perceptron().fit(x_train, y_train) y_pred = pla.predict(x_test) print(accuracy_score(y_test, y_pred)) result 1 : 0.9473684210526315 Use all features and use standardscaler: cancer = datasets.load_breast_cancer() x = cancer

random_state parameter in classification models

北城以北 提交于 2021-02-16 14:25:07
问题 Can someone explain why does the random_state parameter affects the model so much? I have a RandomForestClassifier model and want to set the random_state (for reproducibility pourpouses), but depending on the value I use I get very different values on my overall evaluation metric (F1 score) For example, I tried to fit the same model with 100 different random_state values and after the training ad testing the smallest F1 was 0.64516129 and the largest 0.808823529). That is a huge difference.

Decision Tree Sklearn -Depth Of tree and accuracy

天大地大妈咪最大 提交于 2021-02-16 09:22:12
问题 I am applying Decision Tree to a data set, using sklearn In Sklearn there is a parameter to select the depth of the tree - dtree = DecisionTreeClassifier(max_depth=10). My question is how the max_depth parameter helps on the model. how does high/low max_depth help in predicting the test data more accurately? 回答1: max_depth is what the name suggests: The maximum depth that you allow the tree to grow to. The deeper you allow, the more complex your model will become. For training error, it is

How to remove stop phrases/stop ngrams (multi-word strings) using pandas/sklearn?

独自空忆成欢 提交于 2021-02-16 09:14:31
问题 I want to prevent certain phrases for creeping into my models. For example, I want to prevent 'red roses' from entering into my analysis. I understand how to add individual stop words as given in Adding words to scikit-learn's CountVectorizer's stop list by doing so: from sklearn.feature_extraction import text additional_stop_words=['red','roses'] However, this also results in other ngrams like 'red tulips' or 'blue roses' not being detected. I am building a TfidfVectorizer as part of my

How to remove stop phrases/stop ngrams (multi-word strings) using pandas/sklearn?

亡梦爱人 提交于 2021-02-16 09:14:22
问题 I want to prevent certain phrases for creeping into my models. For example, I want to prevent 'red roses' from entering into my analysis. I understand how to add individual stop words as given in Adding words to scikit-learn's CountVectorizer's stop list by doing so: from sklearn.feature_extraction import text additional_stop_words=['red','roses'] However, this also results in other ngrams like 'red tulips' or 'blue roses' not being detected. I am building a TfidfVectorizer as part of my

How to remove stop phrases/stop ngrams (multi-word strings) using pandas/sklearn?

风格不统一 提交于 2021-02-16 09:14:06
问题 I want to prevent certain phrases for creeping into my models. For example, I want to prevent 'red roses' from entering into my analysis. I understand how to add individual stop words as given in Adding words to scikit-learn's CountVectorizer's stop list by doing so: from sklearn.feature_extraction import text additional_stop_words=['red','roses'] However, this also results in other ngrams like 'red tulips' or 'blue roses' not being detected. I am building a TfidfVectorizer as part of my