问题
I am trying to plot a decision region (based on the output of a logistic regression) with matplotlib contourf funtion. The code I am using:
subplot.contourf(x2, y2, P, cmap=cmap_light, alpha = 0.8)
where x2 and y2 are two 2D matrices generated via numpy meshgrids. P is computed using
P = clf.predict(numpy.c_[x2.ravel(), y2.ravel()])
P = P.reshape(x2.shape)
Each element of P is a boolean value based on the output of the logistic regresssion. The rendered plot looks like this
My question is how does the contourf function know where to draw the contour based on a 2D matrix of boolean values? (x2, y2 are just numpy meshgrids) I looked up the docs several times but could not understand.
回答1:
In order to illustrate what's happening, here is an example using the 2 first features (sepal length and width) of the iris dataset.
First, the regression is calculated from the given data (dots with black outline). Then, for each point of a grid covering the data, a prediction is calculated (small dots in a grid). Note that the given and predicted values are just the numbers 0, 1 and 2. (In the question, only 0 and 1 are used.)
The last step is using these grid points as input to search contours of regions with an equal predicted value. So, a contour line is drawn between the grid points that have value 0 and the ones with value 1. And another between values 1 and 2. A contourf fills the area between the lines with a uniform color.
As the grid points and their prediction aren't visualized in the question's plot, the sudden contours are harder to understand.
from matplotlib import pyplot as plt
import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
X, y = load_iris(return_X_y=True)
X = X[:, :2]
clf = LogisticRegression(random_state=0).fit(X, y)
x2, y2 = np.meshgrid(np.linspace(X[:, 0].min()-.5, X[:, 0].max()+.5, 20),
np.linspace(X[:, 1].min()-.5, X[:, 1].max()+.5, 20) )
pred = clf.predict(np.c_[x2.ravel(), y2.ravel()])
cmap = plt.get_cmap('Set1', 3)
plt.scatter(x2.ravel(), y2.ravel(), c=pred, s=10, cmap=cmap, label='Prediction on grid')
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap=cmap, ec='black', label='Given values')
plt.contourf(x2, y2, pred.reshape(x2.shape), cmap=cmap, alpha=0.4, levels=2, zorder=0)
plt.legend(ncol=2, loc="lower center", bbox_to_anchor=(0.5,1.01))
plt.show()
PS: About pred.reshape(x2.shape):
x2andy2are arrays giving the x and y coordinate of each grid point.x2andy2are organized as 2D arrays similar to the grid they represent (20x020 in the example).- However, the function
clf.predictneeds its input arrays to be 1d. To that end,.ravel()is used: it just makes one long 1d array out of the 2d array. In the example,ravelconverts the 20x20 arrays to 1d arrays of 400. - The result of
pred = clf.predictis a corresponding 1d array (400 elements). pred.reshape(x2.shape)convertspredto the same 2d format asx2andy2(again 20x20).- Note that
scatterwants its parameters in 1d format, it only looks at each point individually.contourfon the other hand wants its parameters in 2d format, as it needs to know how the grid is organized.
来源:https://stackoverflow.com/questions/63234019/explain-matplotlib-contourf-function