I have a vector/array of n elements. I want to choose m elements.
The choices must be fair / deterministic -- equally many from each subsection.
With m=10, n
You probably need Bresenham's line algorithm. Choosing m
elements uniformly from n
is equivalent to drawing a line in m
xn
discrete pixel grid. Assume x
coordinate in 0
..n-1
and y
coordinate 0
..m-1
, and proceed like if you were drawing a line between (0,0) and (n-1,m-1). Whenever y
coordinate changes, pick an element from index x
.
UPD: But it seems that this simple function will suffice you:
>>> f = lambda m, n: [i*n//m + n//(2*m) for i in range(m)]
>>> f(1,20)
[10]
>>> f(2,20)
[5, 15]
>>> f(3,20)
[3, 9, 16]
>>> f(5,20)
[2, 6, 10, 14, 18]
Here is a quick example:
from math import ceil
def takespread(sequence, num):
length = float(len(sequence))
for i in range(num):
yield sequence[int(ceil(i * length / num))]
math.ceil
is used because without it, the chosen indexes will be weighted too much toward the beginning of each implicit subsection, and as a result the list as a whole.
Use a loop (int i=0; i < m; i++)
Then to get the indexes you want, Ceil(i*m/n).
I am working on a clinical application and found all answers above to have different degrees of bias. Here's another solution that works well even in a circle. That is, even if the last number wraps around as in when you work with degrees 0° = 360°.
import numpy as np
m = 51
# Generate intervals
epts = np.linspace(0,360,m+1,endpoint=True)
# Create the halfsteps between intervals (One would have sufficed)
halfsteps = (epts[1:] - epts[:-1]) / 2
# Find the midpoints
midpoints = epts[:-1] + halfsteps
# Make an unbiased rounding
results = np.around(midpoints, decimals=0)
This will always select the first and last elements:
which_idxs = lambda m, n: np.rint( np.linspace( 1, n, min(m,n) ) - 1 ).astype(int)
evenly_spaced = np.array( your_list )[which_idxs(m,n)]
This will only select a maximum of n elements, in case m is higher than n.
If you truly want it equally spread throughout the array, even at the ends, then it would be this instead:
which_idxs = lambda m, n: [idx for idx in np.rint( np.linspace( 1-n/(2*min(m,n)), n+n/(2*min(m,n)), min(m,n)+2 ) - 1 ).astype(int) if idx in range(n)]
evenly_spaced = np.array( your_list )[which_idxs(m,n)]
Which gives you something like this:
>>> np.array( [1, 2, 3, 'a', 'b', 'c'] )[which_idxs(m,n)]
Out: array(['2', 'b'])