Broadcasted NumPy arithmetic - why is one method so much more performant?
问题 This question is a follow up to my answer in Efficient way to compute the Vandermonde matrix. Here's the setup: x = np.arange(5000) # an integer array N = 4 Now, I'll compute the Vandermonde matrix in two different ways: m1 = (x ** np.arange(N)[:, None]).T And, m2 = x[:, None] ** np.arange(N) Sanity check: np.array_equal(m1, m2) True These methods are identical, but their performance is not: %timeit m1 = (x ** np.arange(N)[:, None]).T 42.7 µs ± 271 ns per loop (mean ± std. dev. of 7 runs,