plotting decision boundary of logistic regression

问题

I'm implementing logistic regression. I managed to get probabilities out of it, and am able to predict a 2 class classification task.

My question is:

For my final model, I have weights and the training data. There are 2 features, so my weight is a vector with 2 rows.

How do I plot this? I saw this post, but I don't quite understand the answer. Do I need a contour plot?

回答1:

An advantage of the logistic regression classifier is that once you fit it, you can get probabilities for any sample vector. That may be more interesting to plot. Here's an example using scikit-learn:

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="white")

First, generate the data and fit the classifier to the training set:

X, y = make_classification(200, 2, 2, 0, weights=[.5, .5], random_state=15)
clf = LogisticRegression().fit(X[:100], y[:100])

Next, make a continuous grid of values and evaluate the probability of each (x, y) point in the grid:

xx, yy = np.mgrid[-5:5:.01, -5:5:.01]
grid = np.c_[xx.ravel(), yy.ravel()]
probs = clf.predict_proba(grid)[:, 1].reshape(xx.shape)

Now, plot the probability grid as a contour map and additionally show the test set samples on top of it:

f, ax = plt.subplots(figsize=(8, 6))
contour = ax.contourf(xx, yy, probs, 25, cmap="RdBu",
                      vmin=0, vmax=1)
ax_c = f.colorbar(contour)
ax_c.set_label("$P(y = 1)$")
ax_c.set_ticks([0, .25, .5, .75, 1])

ax.scatter(X[100:,0], X[100:, 1], c=y[100:], s=50,
           cmap="RdBu", vmin=-.2, vmax=1.2,
           edgecolor="white", linewidth=1)

ax.set(aspect="equal",
       xlim=(-5, 5), ylim=(-5, 5),
       xlabel="$X_1$", ylabel="$X_2$")

The logistic regression lets your classify new samples based on any threshold you want, so it doesn't inherently have one "decision boundary." But, of course, a common decision rule to use is p = .5. We can also just draw that contour level using the above code:

f, ax = plt.subplots(figsize=(8, 6))
ax.contour(xx, yy, probs, levels=[.5], cmap="Greys", vmin=0, vmax=.6)

ax.scatter(X[100:,0], X[100:, 1], c=y[100:], s=50,
           cmap="RdBu", vmin=-.2, vmax=1.2,
           edgecolor="white", linewidth=1)

ax.set(aspect="equal",
       xlim=(-5, 5), ylim=(-5, 5),
       xlabel="$X_1$", ylabel="$X_2$")

回答2:

The accepted answer is nice for this, but it can also be useful especially when trying to understand what the weights mean, to convert the weights into slope/ intercept form and just draw the decision boundary.

The logits are the form wx + b but in the case of binary classification x and w are two-dimensional. And one of those x values actually represents y on the plot. This means the equation of the line will look like:

w[1] * y = w[0] * x + b 
# to solve for y
y = (w[0] * x)/w[1] + b / w[1]

Plotting that where x_np is your data and w + b are your learned parameters the will be something as simple as:

plt.scatter(x_np[:,0], x_np[:,1], c=y_np.reshape(-1),cmap=mpl.colors.ListedColormap(colors))
ax = plt.gca()
ax.autoscale(False)
x_vals = np.array(ax.get_xlim())
y_vals = -(x_vals * w_guess[0] + b_guess[0])/w_guess[1]
plt.plot(x_vals, y_vals, '--', c="red")

来源：https://stackoverflow.com/questions/28256058/plotting-decision-boundary-of-logistic-regression

标签

matplotlib

scikit-learn

logistic-regression