Plotting decision boundary for High Dimension Data

限于喜欢 提交于 2019-12-02 22:56:43

One way is to impose a Voronoi tesselation on your 2D plot, i.e. color it based on proximity to the 2D data points (different colors for each predicted class label). See recent paper by Migut et al., 2015.

This is a lot easier than it sounds using a meshgrid and scikit's KNeighborsClassifier (this is an end to end example with the Iris dataset; replace the first few lines with your model/code):

import numpy as np, matplotlib.pyplot as plt
from sklearn.neighbors.classification import KNeighborsClassifier
from sklearn.datasets.base import load_iris
from sklearn.manifold.t_sne import TSNE
from sklearn.linear_model.logistic import LogisticRegression

# replace the below by your data and model
iris = load_iris()
X,y = iris.data, iris.target
X_Train_embedded = TSNE(n_components=2).fit_transform(X)
print X_Train_embedded.shape
model = LogisticRegression().fit(X,y)
y_predicted = model.predict(X)
# replace the above by your data and model

# create meshgrid
resolution = 100 # 100x100 background pixels
X2d_xmin, X2d_xmax = np.min(X_Train_embedded[:,0]), np.max(X_Train_embedded[:,0])
X2d_ymin, X2d_ymax = np.min(X_Train_embedded[:,1]), np.max(X_Train_embedded[:,1])
xx, yy = np.meshgrid(np.linspace(X2d_xmin, X2d_xmax, resolution), np.linspace(X2d_ymin, X2d_ymax, resolution))

# approximate Voronoi tesselation on resolution x resolution grid using 1-NN
background_model = KNeighborsClassifier(n_neighbors=1).fit(X_Train_embedded, y_predicted) 
voronoiBackground = background_model.predict(np.c_[xx.ravel(), yy.ravel()])
voronoiBackground = voronoiBackground.reshape((resolution, resolution))

#plot
plt.contourf(xx, yy, voronoiBackground)
plt.scatter(X_Train_embedded[:,0], X_Train_embedded[:,1], c=y)
plt.show()

Note that rather than precisely plotting your decision boundary, this will just give you an estimate of roughly where the boundary should lie (especially in regions with few data points, the true boundary can deviate from this). It will draw a line between two data points belonging to different classes, but will place it in the middle (there is indeed guaranteed to be a decision boundary between those points in this case, but it does not necessarily have to be in the middle).

There are also some experimental approaches to better approximate the true decision boundary, e.g. this one on github

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!