scikit-learn | 易学教程

Row wise outlier detection in python

阅读更多关于 Row wise outlier detection in python

问题 I have the CSV data as follows: A_ID P_ID 1429982904 1430370002 1430974801 1431579602 1432184403 1432789202 1435208402 1435308653 11Jgipc qjMakF 364 365 363 363 364 364 364 367 11Jgipc qxL8FJ 18 18 18 18 18 18 18 18 11Jgipc r0Bpnt 40 40 41 41 41 42 42 42 11Jgipc roLk4N 140 140 143 143 146 147 147 149 11Jgipc tOudhM 12 13 13 13 13 13 14 14 11Jgipc u-x6o8 678 678 688 688 689 690 692 695 11Jgipc u5HHmV 1778 1785 1811 1811 1819 1826 1834 1836 11Jgipc ufrVoP 67 67 67 67 67 67 67 67 11Jgipc vRqMK4

sklearn grid.fit(X,y) - error: “positional indexers are out-of-bounds” for X_train,y_train

阅读更多关于 sklearn grid.fit(X,y) - error: “positional indexers are out-of-bounds” for X_train,y_train

问题 This is a question about scikit learn (version 0.17.0) in Python 2.7 along with Pandas 0.17.1. In order to split raw data (with no missing entries) using the approach detailed here, I have found that if the split data are used to proceed with a .fit() that there is an error that appears. Here is the code taken largely unchanged from the other stackoverflow question with renaming of variables. I have then instantiated a grid and tried to fit the split data with the aim of determining optimal

Drawing boundary lines based on kmeans cluster centres

阅读更多关于 Drawing boundary lines based on kmeans cluster centres

问题 I'm quite new to scikit learn, but wanted to try an interesting project. I have longitude and latitudes for points in the UK, which I used to create cluster centers using scikit learns KMeans class. To visualise this data, rather than having the points as clusters, I wanted to instead draw boundaries around each cluster. For example, if one cluster was London and the other Oxford, I currently have a point at the center of each city, but I was wondering if there's a way to use this data to

Drawing boundary lines based on kmeans cluster centres

阅读更多关于 Drawing boundary lines based on kmeans cluster centres

Scikit-learn: How do we define a distance metric's parameter for grid search

阅读更多关于 Scikit-learn: How do we define a distance metric's parameter for grid search

问题 I have following code snippet that attempts to do a grid search in which one of the grid parameters are the distance metrics to be used for the KNN algorithm. The example below fails if I use "wminkowski", "seuclidean" or "mahalanobis" distances metrics. # Define the parameter values that should be searched k_range = range(1,31) weights = ['uniform' , 'distance'] algos = ['auto', 'ball_tree', 'kd_tree', 'brute'] leaf_sizes = range(10, 60, 10) metrics = ["euclidean", "manhattan", "chebyshev",

Plotting ROC Curve with Multiple Classes

阅读更多关于 Plotting ROC Curve with Multiple Classes

问题 I am following the documentation for plotting ROC curves for multiple classes at this link: http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html I am confused about this line in particular: y_score = classifier.fit(X_train, y_train).decision_function(X_test) I've seen that in other examples, y_score holds probabilities, and they are all positive values, as we would expect. However, the y_score (each column for classes A-C) in this example has mostly negative values.

How do I (safely) send a Python object to my Flask API?

阅读更多关于 How do I (safely) send a Python object to my Flask API?

问题 I am currently trying to build a Flask Web API that is able to receive a python object in a POST-request. I am using Python 3.7.1 for creating the request and Python 2.7 for running the API. The API is set up to run on my local machine. The object I am trying to send to my API is a RandomForestClassifier object from sklearn.ensemble , but this could be any of a wide variety of object types. So far I have tried to json.dumps() my object, but this object is not JSON serializable. I have also

How do I (safely) send a Python object to my Flask API?

阅读更多关于 How do I (safely) send a Python object to my Flask API?

Combining Recursive Feature Elimination and Grid Search in scikit-learn

阅读更多关于 Combining Recursive Feature Elimination and Grid Search in scikit-learn

问题 I am trying to combine recursive feature elimination and grid search in scikit-learn. As you can see from the code below (which works), I am able to get the best estimator from a grid search and then pass that estimator to RFECV. However, I would rather do the RFECV first, then the grid search. The problem is that when I pass the selector from RFECV to the grid search, it does not take it: ValueError: Invalid parameter bootstrap for estimator RFECV Is it possible to get the selector from

Combining Recursive Feature Elimination and Grid Search in scikit-learn

阅读更多关于 Combining Recursive Feature Elimination and Grid Search in scikit-learn