Random Sample of a subset of a dataframe in Pandas

后端 未结 3 2082
北荒
北荒 2020-12-10 23:58

Say i have a dataframe with 100,000 entries and want to split it into 100 sections of 1000 entries.

How do i take a random sample of say size 50 of just one of the

3条回答
  •  情书的邮戳
    2020-12-11 00:39

    One solution is to use the choice function from numpy.

    Say you want 50 entries out of 100, you can use:

    import numpy as np
    chosen_idx = np.random.choice(1000, replace=False, size=50)
    df_trimmed = df.iloc[chosen_idx]
    

    This is of course not considering your block structure. If you want a 50 item sample from block i for example, you can do:

    import numpy as np
    block_start_idx = 1000 * i
    chosen_idx = np.random.choice(1000, replace=False, size=50)
    df_trimmed_from_block_i = df.iloc[block_start_idx + chosen_idx]
    

提交回复
热议问题