Random Sample of a subset of a dataframe in Pandas

后端 未结 3 2081
北荒
北荒 2020-12-10 23:58

Say i have a dataframe with 100,000 entries and want to split it into 100 sections of 1000 entries.

How do i take a random sample of say size 50 of just one of the

3条回答
  •  生来不讨喜
    2020-12-11 00:39

    You can use the sample method*:

    In [11]: df = pd.DataFrame([[1, 2], [3, 4], [5, 6], [7, 8]], columns=["A", "B"])
    
    In [12]: df.sample(2)
    Out[12]:
       A  B
    0  1  2
    2  5  6
    
    In [13]: df.sample(2)
    Out[13]:
       A  B
    3  7  8
    0  1  2
    

    *On one of the section DataFrames.

    Note: If you have a larger sample size that the size of the DataFrame this will raise an error unless you sample with replacement.

    In [14]: df.sample(5)
    ValueError: Cannot take a larger sample than population when 'replace=False'
    
    In [15]: df.sample(5, replace=True)
    Out[15]:
       A  B
    0  1  2
    1  3  4
    2  5  6
    3  7  8
    1  3  4
    

提交回复
热议问题