(Nearly) Evenly select items from a list

寵の児 提交于 2019-12-22 08:53:36

问题


I have a list of N elements, and I'd like to sample M (<= N) values which are as evenly spaced as possible. To be more specific, lets say the selection should minimize the differences in the spacings between sampled points. For example, lets say I'm constructing a boolean indexing array (i.e. in python) to select elements,

I tried the algorithm (from this similar, but different question: How do you split a list into evenly sized chunks?) :

q, r = divmod(N, M)
indices = [q*jj + min(jj, r) for jj in range(M)]

And sometimes that works well:

N=11 M=6
good_index = [0 1 0 1 0 1 0 1 0 1 0]

N=14 M=6
good_index = [0 1 1 0 1 1 0 1 0 1 0 1 0 1]

Here, the first example is trivial because the array can be evenly divided. The second example cannot be evenly divided, but the spacing between points is as similar as possible (2, 2, 1, 1, 1, 1).

But often it works poorly:

N=16 M=10
bad_index = [0 1 0 1 0 1 0 1 0 1 0 1 0 0 0 0]

N=14 M=10
bad_index = [0 1 0 1 0 1 0 1 0 0 0 0 0 0]

Because you have values piled up at the end.


Edit 1: woops, just realized each list above is technically inverted (0's should be 1's and visa-versa).... but should still convey the right idea.


Edit 2: the above algorithm tends to work better (i.e. visual inspection from choosing random numbers than something conceptually simpler like,

step = int(floor(N/M))
last = M * step  # this prevents us from getting M+1 elements
indices = [ii for ii in range(0, last, step)]

回答1:


Looking at the results of a few tests (even the ones included above), the problem is when M > N/2. I.e. when more than half of the values are being sampled. But it works great for M < N/2. So the solution I'm using for the moment is simply to invert the problem when M > N/2:

Note: this is actually creating a masking list of size N that is False for M elements as evenly spaced as possible.

import numpy as np

def even_select(N, M):
    if M > N/2:
        cut = np.zeros(N, dtype=int)
        q, r = divmod(N, N-M)
        indices = [q*i + min(i, r) for i in range(N-M)]
        cut[indices] = True
    else:
        cut = np.ones(N, dtype=int)
        q, r = divmod(N, M)
        indices = [q*i + min(i, r) for i in range(M)]
        cut[indices] = False

    return cut

I'd still be interested in more elegant solutions if they exist.



来源:https://stackoverflow.com/questions/46494029/nearly-evenly-select-items-from-a-list

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!