I have a file with 4 columns and thousands of rows. I want to remove rows whose items in the first column are in a certain range. For example, if the data in my file is as f
You don't need to add complexity with numpy for this. I'm guessing you're reading your file in into a list of lists here (with each row being a list within the overall data list like this: ((18, 6.215, 0.025), (19, 6.203, 0.025), ...)). In which case use the below rule:
for row in data:
if((row[0] > 20 and row[0] < 25) or (row[0] > 30 and row[0] < 35)):
data.remove(row)
In the special but frequent case that the selection criterion is whether a value hits an interval, I use the abs()
of the difference to the mid of the interval, especially if midInterval
has a physical meaning:
data = data[abs(data[:,0] - midInterval) < deviation] # '<' for keeping the interval
If the data type is integer and the mid value is not (as in Jun's request), you could double the values instead of conversion to float (rounding errors become > 1 for huge integers):
data = data[abs(2*data[:,0] - sumOfLimits) > deltaOfLimits]
Repeat to remove two intervals. With the limits in Jun's question:
data = data[abs(2*data[:,0] - 45) > 3]
data = data[abs(2*data[:,0] - 65) > 3]
If you want to keep using numpy
, the solution isn't hard.
data = data[np.logical_not(np.logical_and(data[:,0] > 20, data[:,0] < 25))]
data = data[np.logical_not(np.logical_and(data[:,0] > 30, data[:,0] < 35))]
Or if you want to combine it all into one statement,
data = data[
np.logical_not(np.logical_or(
np.logical_and(data[:,0] > 20, data[:,0] < 25),
np.logical_and(data[:,0] > 30, data[:,0] < 35)
))
]
To explain, conditional statements like data[:,0] < 25
create boolean arrays that track, element-by-element, where the condition in an array is true or false. In this case, it tells you where the first column of data is less than 25.
You can also index numpy arrays with these boolean arrays. A statement like data[data[:,0] > 30]
extracts all the rows where data[:,0] > 30
is true, or all the rows where the first element is greater than 30. This kind of conditional indexing is how you extract the rows (or columns, or elements) that you want.
Finally, we need logical tools to combine boolean arrays element-by-element. Regular and
, or
, and not
statements don't work because they try to combine the boolean arrays together as a whole. Fortunately, numpy provides a set of these tools for use in the form of np.logical_and
, np.logical_or
, and np.logical_not
. With these, we can combine our boolean arrays element-wise to find rows that satisfy more complicated conditions.
Find below my solution to the problem of deletion specific rows from a numpy array. The solution is provided as one-liner of the form:
# Remove the rows whose first item is between 20 and 25
A = np.delete(A, np.where( np.bitwise_and( (A[:,0]>=20), (A[:,0]<=25) ) )[0], 0)
and is based on pure numpy functions (np.bitwise_and, np.where, np.delete).
A = np.array( [ [ 18, 6.215, 0.025 ],
[ 19, 6.203, 0.025 ],
[ 20, 6.200, 0.025 ],
[ 21, 6.205, 0.025 ],
[ 22, 6.201, 0.026 ],
[ 23, 6.197, 0.026 ],
[ 24, 6.188, 0.024 ],
[ 25, 6.187, 0.023 ],
[ 26, 6.189, 0.021 ],
[ 27, 6.188, 0.020 ],
[ 28, 6.192, 0.019 ],
[ 29, 6.185, 0.020 ],
[ 30, 6.189, 0.019 ],
[ 31, 6.191, 0.018 ],
[ 32, 6.188, 0.019 ],
[ 33, 6.187, 0.019 ],
[ 34, 6.194, 0.021 ],
[ 35, 6.192, 0.024 ],
[ 36, 6.193, 0.024 ],
[ 37, 6.187, 0.026 ],
[ 38, 6.184, 0.026 ],
[ 39, 6.183, 0.027 ],
[ 40, 6.189, 0.027 ] ] )
# Remove the rows whose first item is between 20 and 25
A = np.delete(A, np.where( np.bitwise_and( (A[:,0]>=20), (A[:,0]<=25) ) )[0], 0)
# Remove the rows whose first item is between 30 and 35
A = np.delete(A, np.where( np.bitwise_and( (A[:,0]>=30), (A[:,0]<=35) ) )[0], 0)
>>> A
array([[ 1.80000000e+01, 6.21500000e+00, 2.50000000e-02],
[ 1.90000000e+01, 6.20300000e+00, 2.50000000e-02],
[ 2.60000000e+01, 6.18900000e+00, 2.10000000e-02],
[ 2.70000000e+01, 6.18800000e+00, 2.00000000e-02],
[ 2.80000000e+01, 6.19200000e+00, 1.90000000e-02],
[ 2.90000000e+01, 6.18500000e+00, 2.00000000e-02],
[ 3.60000000e+01, 6.19300000e+00, 2.40000000e-02],
[ 3.70000000e+01, 6.18700000e+00, 2.60000000e-02],
[ 3.80000000e+01, 6.18400000e+00, 2.60000000e-02],
[ 3.90000000e+01, 6.18300000e+00, 2.70000000e-02],
[ 4.00000000e+01, 6.18900000e+00, 2.70000000e-02]])