Say i have a dataframe with 100,000 entries and want to split it into 100 sections of 1000 entries.
How do i take a random sample of say size 50 of just one of the
One solution is to use the choice function from numpy.
Say you want 50 entries out of 100, you can use:
import numpy as np
chosen_idx = np.random.choice(1000, replace=False, size=50)
df_trimmed = df.iloc[chosen_idx]
This is of course not considering your block structure. If you want a 50 item sample from block i for example, you can do:
import numpy as np
block_start_idx = 1000 * i
chosen_idx = np.random.choice(1000, replace=False, size=50)
df_trimmed_from_block_i = df.iloc[block_start_idx + chosen_idx]