Find large number of consecutive values fulfilling condition in a numpy array

后端未结

关注

 8  1561

I have some audio data loaded in a numpy array and I wish to segment the data by finding silent parts, i.e. parts where the audio amplitude is below a certain threshold over

相关标签:

8条回答

生来不讨喜

2020-12-05 01:26
I know I'm late to the party, but another way to do this is with 1d convolutions:
```
np.convolve(sig > threshold, np.ones((cons_samples)), 'same') == cons_samples
```
Where cons_samples is the number of consecutive samples you require above threshold
0 讨论(0)
发布评论:

提交评论
- 加载中...
不知归路

2020-12-05 01:27
@joe-kington I've got about 20%-25% speed improvement over np.diff / np.nonzero solution by using argmax instead (see code below, condition is boolean)
```
def contiguous_regions(condition):
    idx = []
    i = 0
    while i < len(condition):
        x1 = i + condition[i:].argmax()
        try:
            x2 = x1 + condition[x1:].argmin()
        except:
            x2 = x1 + 1
        if x1 == x2:
            if condition[x1] == True:
                x2 = len(condition)
            else:
                break
        idx.append( [x1,x2] )
        i = x2
    return idx
```
Of course, your mileage may vary depending on your data.

Besides, I'm not entirely sure, but i guess numpy may optimize argmin/argmax over boolean arrays to stop searching on first True/False occurrence. That might explain it.
0 讨论(0)
发布评论:

提交评论
- 加载中...

我在风中等你

2020-12-05 01:28

another way to do this quickly and concisely:

import pylab as pl

v=[0,0,1,1,0,0,1,1,1,1,1,0,1,0,1,1,0,0,0,0,0,1,0,0]
vd = pl.diff(v)
#vd[i]==1 for 0->1 crossing; vd[i]==-1 for 1->0 crossing
#need to add +1 to indexes as pl.diff shifts to left by 1

i1=pl.array([i for i in xrange(len(vd)) if vd[i]==1])+1
i2=pl.array([i for i in xrange(len(vd)) if vd[i]==-1])+1

#corner cases for the first and the last element
if v[0]==1:
  i1=pl.hstack((0,i1))
if v[-1]==1:
  i2=pl.hstack((i2,len(v)))

now i1 contains the beginning index and i2 the end index of 1,...,1 areas

0 讨论(0)

误落风尘

2020-12-05 01:29
Slightly sloppy, but simple and fast-ish, if you don't mind using scipy:
```
from scipy.ndimage import gaussian_filter
sigma = 3
threshold = 1
above_threshold = gaussian_filter(data, sigma=sigma) > threshold
```
The idea is that quiet portions of the data will smooth down to low amplitude, and loud regions won't. Tune 'sigma' to affect how long a 'quiet' region must be; tune 'threshold' to affect how quiet it must be. This slows down for large sigma, at which point using FFT-based smoothing might be faster.

This has the added benefit that single 'hot pixels' won't disrupt your silence-finding, so you're a little less sensitive to certain types of noise.
0 讨论(0)
发布评论:

提交评论
- 加载中...

猫巷女王i

2020-12-05 01:45

I haven't tested this but you it should be close to what you are looking for. Slightly more lines of code but should be more efficient, readable, and it doesn't abuse regular expressions :-)

def find_silent(samples):
    num_silent = 0
    start = 0
    for index in range(0, len(samples)):
        if abs(samples[index]) < SILENCE_THRESHOLD:
            if num_silent == 0:
                start = index
            num_silent += 1
        else:
            if num_silent > MIN_SILENCE:
                yield samples[start:index]
            num_silent = 0
    if num_silent > MIN_SILENCE:
        yield samples[start:]

for match in find_silent(samples):
    # code goes here

0 讨论(0)

-上瘾入骨i

2020-12-05 01:48

Here's a numpy-based solution.

I think (?) it should be faster than the other options. Hopefully it's fairly clear.

However, it does require a twice as much memory as the various generator-based solutions. As long as you can hold a single temporary copy of your data in memory (for the diff), and a boolean array of the same length as your data (1-bit-per-element), it should be pretty efficient...

import numpy as np

def main():
    # Generate some random data
    x = np.cumsum(np.random.random(1000) - 0.5)
    condition = np.abs(x) < 1

    # Print the start and stop indicies of each region where the absolute 
    # values of x are below 1, and the min and max of each of these regions
    for start, stop in contiguous_regions(condition):
        segment = x[start:stop]
        print start, stop
        print segment.min(), segment.max()

def contiguous_regions(condition):
    """Finds contiguous True regions of the boolean array "condition". Returns
    a 2D array where the first column is the start index of the region and the
    second column is the end index."""

    # Find the indicies of changes in "condition"
    d = np.diff(condition)
    idx, = d.nonzero() 

    # We need to start things after the change in "condition". Therefore, 
    # we'll shift the index by 1 to the right.
    idx += 1

    if condition[0]:
        # If the start of condition is True prepend a 0
        idx = np.r_[0, idx]

    if condition[-1]:
        # If the end of condition is True, append the length of the array
        idx = np.r_[idx, condition.size] # Edit

    # Reshape the result into two columns
    idx.shape = (-1,2)
    return idx

main()

0 讨论(0)

1 2 下一页