scikit-learn

Row wise outlier detection in python

99封情书 提交于 2021-02-07 10:16:40
问题 I have the CSV data as follows: A_ID P_ID 1429982904 1430370002 1430974801 1431579602 1432184403 1432789202 1435208402 1435308653 11Jgipc qjMakF 364 365 363 363 364 364 364 367 11Jgipc qxL8FJ 18 18 18 18 18 18 18 18 11Jgipc r0Bpnt 40 40 41 41 41 42 42 42 11Jgipc roLk4N 140 140 143 143 146 147 147 149 11Jgipc tOudhM 12 13 13 13 13 13 14 14 11Jgipc u-x6o8 678 678 688 688 689 690 692 695 11Jgipc u5HHmV 1778 1785 1811 1811 1819 1826 1834 1836 11Jgipc ufrVoP 67 67 67 67 67 67 67 67 11Jgipc vRqMK4

sklearn grid.fit(X,y) - error: “positional indexers are out-of-bounds” for X_train,y_train

人走茶凉 提交于 2021-02-07 10:11:26
问题 This is a question about scikit learn (version 0.17.0) in Python 2.7 along with Pandas 0.17.1. In order to split raw data (with no missing entries) using the approach detailed here, I have found that if the split data are used to proceed with a .fit() that there is an error that appears. Here is the code taken largely unchanged from the other stackoverflow question with renaming of variables. I have then instantiated a grid and tried to fit the split data with the aim of determining optimal

Drawing boundary lines based on kmeans cluster centres

 ̄綄美尐妖づ 提交于 2021-02-07 09:47:35
问题 I'm quite new to scikit learn, but wanted to try an interesting project. I have longitude and latitudes for points in the UK, which I used to create cluster centers using scikit learns KMeans class. To visualise this data, rather than having the points as clusters, I wanted to instead draw boundaries around each cluster. For example, if one cluster was London and the other Oxford, I currently have a point at the center of each city, but I was wondering if there's a way to use this data to

Drawing boundary lines based on kmeans cluster centres

ぃ、小莉子 提交于 2021-02-07 09:47:08
问题 I'm quite new to scikit learn, but wanted to try an interesting project. I have longitude and latitudes for points in the UK, which I used to create cluster centers using scikit learns KMeans class. To visualise this data, rather than having the points as clusters, I wanted to instead draw boundaries around each cluster. For example, if one cluster was London and the other Oxford, I currently have a point at the center of each city, but I was wondering if there's a way to use this data to

Scikit-learn: How do we define a distance metric's parameter for grid search

╄→гoц情女王★ 提交于 2021-02-07 09:45:48
问题 I have following code snippet that attempts to do a grid search in which one of the grid parameters are the distance metrics to be used for the KNN algorithm. The example below fails if I use "wminkowski", "seuclidean" or "mahalanobis" distances metrics. # Define the parameter values that should be searched k_range = range(1,31) weights = ['uniform' , 'distance'] algos = ['auto', 'ball_tree', 'kd_tree', 'brute'] leaf_sizes = range(10, 60, 10) metrics = ["euclidean", "manhattan", "chebyshev",

Plotting ROC Curve with Multiple Classes

♀尐吖头ヾ 提交于 2021-02-07 09:26:26
问题 I am following the documentation for plotting ROC curves for multiple classes at this link: http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html I am confused about this line in particular: y_score = classifier.fit(X_train, y_train).decision_function(X_test) I've seen that in other examples, y_score holds probabilities, and they are all positive values, as we would expect. However, the y_score (each column for classes A-C) in this example has mostly negative values.

How do I (safely) send a Python object to my Flask API?

前提是你 提交于 2021-02-07 09:16:04
问题 I am currently trying to build a Flask Web API that is able to receive a python object in a POST-request. I am using Python 3.7.1 for creating the request and Python 2.7 for running the API. The API is set up to run on my local machine. The object I am trying to send to my API is a RandomForestClassifier object from sklearn.ensemble , but this could be any of a wide variety of object types. So far I have tried to json.dumps() my object, but this object is not JSON serializable. I have also

How do I (safely) send a Python object to my Flask API?

不羁的心 提交于 2021-02-07 09:14:09
问题 I am currently trying to build a Flask Web API that is able to receive a python object in a POST-request. I am using Python 3.7.1 for creating the request and Python 2.7 for running the API. The API is set up to run on my local machine. The object I am trying to send to my API is a RandomForestClassifier object from sklearn.ensemble , but this could be any of a wide variety of object types. So far I have tried to json.dumps() my object, but this object is not JSON serializable. I have also

Combining Recursive Feature Elimination and Grid Search in scikit-learn

醉酒当歌 提交于 2021-02-07 07:09:14
问题 I am trying to combine recursive feature elimination and grid search in scikit-learn. As you can see from the code below (which works), I am able to get the best estimator from a grid search and then pass that estimator to RFECV. However, I would rather do the RFECV first, then the grid search. The problem is that when I pass the selector ​from RFECV to the grid search, it does not take it: ValueError: Invalid parameter bootstrap for estimator RFECV Is it possible to get the selector from

Combining Recursive Feature Elimination and Grid Search in scikit-learn

醉酒当歌 提交于 2021-02-07 07:07:37
问题 I am trying to combine recursive feature elimination and grid search in scikit-learn. As you can see from the code below (which works), I am able to get the best estimator from a grid search and then pass that estimator to RFECV. However, I would rather do the RFECV first, then the grid search. The problem is that when I pass the selector ​from RFECV to the grid search, it does not take it: ValueError: Invalid parameter bootstrap for estimator RFECV Is it possible to get the selector from