Random sampling of non-overlapping substrings of length k

前端 未结 2 1707
野的像风
野的像风 2021-02-10 06:28

Given a string of length n, how would I (pseudo)randomly sample m substrings of size k such that none of the sampled substrings overlap? Most of my sc

2条回答
  •  离开以前
    2021-02-10 06:40

    This is a recursive approach in Python. At each step, randomly select from among the remaining partitions of the string, then randomly select a substring of length k from the chosen partition. Replace this partition with the split of the partition on the substring chosen. Filter out partitions of length smaller than k, and repeat. The list of substrings returns when there are m of them, or there are no partitions left with length greater than or equal to k.

    import random
    
    def f(l, k, m, result=[]):
        if len(result) == m or len(l) == 0:
            return result
        else:
            if isinstance(l, str):
                l = [l]
            part_num = random.randint(0, len(l)-1)
            partition = l[part_num]
            start = random.randint(0, len(partition)-k)
            result.append(partition[start:start+k])
            l.remove(partition)
            l.extend([partition[:start], partition[start+k:]])
            return f([part for part in l if len(part) >= k], k, m, result)
    

提交回复
热议问题