I have a 2D numpy array I want to find the 'every' location of all the unique elements. We can find the unique elements using numpy.unique(numpyarray.)
. Here it comes the tricky part. Now I have to know all the locations for every unique element. Lets consider the following example.
array([[1, 1, 2, 2],\ [1, 1, 2, 2],\ [3, 3, 4, 4],\ [3, 3, 4, 4]])
The result should be
1, (0,0),(1,1) 2, (0,2),(1,2) 3, (2,0),(3,1) 4, (2,2),(3,3)
How to do it and what could be a suitable way to store and iterate over the values.
It is to be noted that all the unique values will be adjacent to each other. The only gaps between them could only be zeros. Lets consider another variant
array([[1, 0, 1, 2, 2],\ [1, 0, 1, 2, 2],\ [3, 0, 3, 4, 4],\ [3, 0, 3, 4, 4]])
The result should be
1, (0,0),(1,2) 2, (0,3),(1,4) 3, (2,0),(3,2) 4, (2,3),(3,4)
The zeoros on the boundaries are to be neglected.
thanks a lot
The simple, brute force way to do it is to just use numpy.where
.
For example, if you're just wanting the bounding box:
import numpy as np x = np.array([[1,1,2,2], [1,1,2,2], [3,3,4,4], [3,3,4,4]]) for val in np.unique(x): rows, cols = np.where(x == val) rowstart, rowstop = np.min(rows), np.max(rows) colstart, colstop = np.min(cols), np.max(cols) print val, (rowstart, colstart), (rowstop, colstop)
This will work for the example with zeros, as well.
If the array is large, and you already have scipy
around, you might consider using scipy.ndimage.find_objects
instead, as @unutbu suggested.
In the particular case of your example, where your unique values are sequential integers, you can use find_objects
directly. It expects an array where each sequential integer other than 0 represents an object that it needs to return the bounding box of. (0 is ignored, exactly as you want.) However, in general, you'll need to do a touch of pre-processing to convert arbitrary unique values to sequential integers.
find_objects
retuns a list of tuples of slice
objects. Honestly, these are probably exactly what you want, if you're wanting the bouding box. However, it will look a bit more messy to print out starting and stopping indicies.
import numpy as np import scipy.ndimage as ndimage x = np.array([[1, 0, 1, 2, 2], [1, 0, 1, 2, 2], [3, 0, 3, 4, 4], [3, 0, 3, 4, 4]]) for i, item in enumerate(ndimage.find_objects(x), start=1): print i, item
This will look slightly different than you might expect. These are slice
objects, so the "max" value will always be one higher than the "max" in the previous example. This is so that you can simply slice with the given tuple to get the data in question.
E.g.
for i, item in enumerate(ndimage.find_objects(x), start=1): print i, ':' print x[item], '\n'
If you really want the starts and stops, just do something like this:
for i, (rowslice, colslice) in enumerate(ndimage.find_objects(x), start=1): print i, print (rowslice.start, rowslice.stop - 1), print (colslice.start, colslice.stop - 1)
If your unique values are not sequential integers, you'll need to do a bit of pre-processing, as I mentioned before. You might do something like this:
import numpy as np import scipy.ndimage as ndimage x = np.array([[1.1, 0.0, 1.1, 0.9, 0.9], [1.1, 0.0, 1.1, 0.9, 0.9], [3.3, 0.0, 3.3, 4.4, 4.4], [3.3, 0.0, 3.3, 4.4, 4.4]]) ignored_val = 0.0 labels = np.zeros(data.shape, dtype=np.int) i = 1 for val in np.unique(x): if val != ignored_val: labels[x == val] = i i += 1 # Now we can use the "labels" array as input to find_objects for i, item in enumerate(ndimage.find_objects(labels), start=1): print i, ':' print x[item], '\n'