问题
So I have this in Matplotlib.
plt.scatter(X[: , 0:1][Y == 0], X[: , 2:3][Y==0])
plt.scatter(X[: , 0:1][Y == 1], X[: , 2:3][Y==1])
plt.scatter(X[: , 0:1][Y == 2], X[: , 2:3][Y==2])
I'd like to know if there's a better way to loop instead of this:
for i in range(3):
plt.scatter(X[: , 0:1][Y == i], X[: , 2:3][Y==i])
MVCE:
# CSV: https://gist.githubusercontent.com/netj/8836201/raw/6f9306ad21398ea43cba4f7d537619d0e07d5ae3/iris.csv
data = np.loadtxt('/content/drive/My Drive/Colab Notebooks/Machine Learning/iris.csv', skiprows=1, delimiter=',')
X = data[:, 0:4]
Y = data[:, 4:5]
# Scatter
for i in range(len(np.intersect1d(Y, Y))):
plt.scatter(X[: , 0:1][Y == i], X[: , 3:4][Y==i])
# map(lambda i: plt.scatter(X[: , 0:1][Y == i], X[: , 2:3][Y==i]), range(3))
plt.title("Scatter Sepal Length / Petal Width ")
plt.legend(('Setosa', 'Versicolor', 'Virginica'))
plt.show()
回答1:
Probably the simplest way to display your data is with a single plot containing multiple colors.
The key is to label the data more efficiently. You have the right idea with np.intersect1d(Y, Y)
, but though clever, this not the best way to set up unique values. Instead, I recommend using np.unique. Not only will that remove the need to hard-code the argument to plt.legend, but the return_inverse
argument will allow you to construct attributes directly.
A minor point is that you can index single columns with a single index, rather than a slice.
For example,
X = np.loadtxt('iris.csv', skiprows=1, delimiter=',', usecols=[0, 1, 2, 3])
Y = np.loadtxt('iris.csv', skiprows=1, delimiter=',', usecols=[4], dtype=str)
labels, indices = np.unique(Y, return_inverse=True)
scatter = plt.scatter(X[:, 0], X[:, 2], color=indices)
The array indices
indexes into the three unique values in labels
to get the original array back. You can therefore supply the index as a label for each element.
Constructing a legend for such a labeled dataset is something that matplotlib fully supports out of the box, as I learned from matplotlib add legend with multiple entries for a single scatter plot, which was inspired by this solution. The gist of it is that the object that plt.scatter returns has a method legend_elements which does all the work for you:
plt.legend(scatter.legend_elements()[0], labels)
legend_elements
returns a tuple with two items. The first is handle to a collection of elements with distinct labels that can be used as the first argument to legend
. The second is a set of default text labels based on the
numerical labels you supplied. We discard these in favor of our actual text labels.
回答2:
You can do a much better job with the indexing by splitting the data properly.
The indexing expression X[:, 0:1][Y == n]
extracts a view of the first column of X
. It then applies the boolean mask Y == n
to the view. Both steps can be done more concisely as a single step: X[Y == n, 0]
. This is a bit inefficient since you will do this for every unique value in Y
.
My other solution called for np.unique
to group the labels. But np.unique
works by sorting the array. We can do that ourselves:
X = np.loadtxt('iris.csv', skiprows=1, delimiter=',', usecols=[0, 1, 2, 3])
Y = np.loadtxt('iris.csv', skiprows=1, delimiter=',', usecols=[4], dtype=str)
ind = np.argsort(Y)
X = X[ind, :]
Y = Y[ind]
To find where Y
changes, you can apply an operation like np.diff
, but tailored to strings:
diffs = Y[:-1] != Y[1:]
The mask can be converted to split indices with np.flatnonzero
:
inds = np.flatnonzero(diffs) + 1
And finally, you can split the data:
data = np.split(X, inds, axis= 0)
For good measure, you can even convert the split data into a dictionary instead of a list:
labels = np.concatenate(([Y[0]], Y[inds]))
data = dict(zip(labels, data))
You can plot with a loop, but much more efficiently now.
for label, group in data.items():
plt.scatter(group[:, 0], group[:, 2], label=label)
plt.legend(labels)
来源:https://stackoverflow.com/questions/61875539/matplotlib-how-to-loop