I have a numpy matrix A
where the data is organised column-vector-vise i.e A[:,0]
is the first data vector, A[:,1]
is the second and so on. I wanted to know whether there was a more elegant way to zero out the mean from this data. I am currently doing it via a for
loop:
mean=A.mean(axis=1) for k in range(A.shape[1]): A[:,k]=A[:,k]-mean
So does numpy provide a function to do this? Or can it be done more efficiently another way?
As is typical, you can do this a number of ways. Each of the approaches below works by adding a dimension to the mean
vector, making it a 4 x 1 array, and then NumPy's broadcasting takes care of the rest. Each approach creates a view of mean
, rather than a deep copy. The first approach (i.e., using newaxis
) is likely preferred by most, but the other methods are included for the record.
In addition to the approaches below, see also ovgolovin's answer, which uses a NumPy matrix to avoid the need to reshape mean
altogether.
For the methods below, we start with the following code and example array A
.
import numpy as np A = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]]) mean = A.mean(axis=1)
>>> A - mean[:, np.newaxis] array([[-1., 0., 1.], [-1., 0., 1.], [-1., 0., 1.], [-1., 0., 1.]])
Using None
The documentation states that None
can be used instead of newaxis
. This is because
>>> np.newaxis is None True
Therefore, the following accomplishes the task.
>>> A - mean[:, None] array([[-1., 0., 1.], [-1., 0., 1.], [-1., 0., 1.], [-1., 0., 1.]])
That said, newaxis
is clearer and should be preferred. Also, a case can be made that newaxis
is more future proof. See also: Numpy: Should I use newaxis or None?
>>> A - mean.reshape((mean.shape[0]), 1) array([[-1., 0., 1.], [-1., 0., 1.], [-1., 0., 1.], [-1., 0., 1.]])
You can alternatively change the shape of mean
directly.
>>> mean.shape = (mean.shape[0], 1) >>> A - mean array([[-1., 0., 1.], [-1., 0., 1.], [-1., 0., 1.], [-1., 0., 1.]])
You can also use matrix
instead of array
. Then you won't need to reshape:
>>> A = np.matrix([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]]) >>> m = A.mean(axis=1) >>> A - m matrix([[-1., 0., 1.], [-1., 0., 1.], [-1., 0., 1.], [-1., 0., 1.]])
Yes. pylab.demean
:
In [1]: X = scipy.rand(2,3) In [2]: X.mean(axis=1) Out[2]: array([ 0.42654669, 0.65216704]) In [3]: Y = pylab.demean(X, axis=1) In [4]: Y.mean(axis=1) Out[4]: array([ 1.85037171e-17, 0.00000000e+00])
Source:
In [5]: pylab.demean?? Type: function Base Class: <type 'function'> String Form: <function demean at 0x38492a8> Namespace: Interactive File: /usr/lib/pymodules/python2.7/matplotlib/mlab.py Definition: pylab.demean(x, axis=0) Source: def demean(x, axis=0): "Return x minus its mean along the specified axis" x = np.asarray(x) if axis == 0 or axis is None or x.ndim <= 1: return x - x.mean(axis) ind = [slice(None)] * x.ndim ind[axis] = np.newaxis return x - x.mean(axis)[ind]
Looks like some of these answers are pretty old, I just tested this on numpy 1.13.3:
>>> import numpy as np >>> a = np.array([[1,1,3],[1,0,4],[1,2,2]]) >>> a array([[1, 1, 3], [1, 0, 4], [1, 2, 2]]) >>> a = a - a.mean(axis=0) >>> a array([[ 0., 0., 0.], [ 0., -1., 1.], [ 0., 1., -1.]])
I think this is much cleaner and simpler. Have a try and let me know if this is somehow inferior than the other answers.