问题
I want to partition a pandas DataFrame into ten disjoint, equally-sized, randomly composed subsets.
I know I can randomly sample one tenth of the original pandas DataFrame using:
partition_1 = pandas.DataFrame.sample(frac=(1/10))
However, how can I obtain the other nine partitions? If I'd do pandas.DataFrame.sample(frac=(1/10)) again, there exists the possibility that my subsets are not disjoint.
Thanks for the help!
回答1:
use np.random.permutations :
df.loc[np.random.permutation(df.index)]
it will shuffle the dataframe and keep column names, after you could split the dataframe into 10.
回答2:
Say df is your dataframe, and you want N_PARTITIONS partitions of roughly equal size (they will be of exactly equal size if len(df) is divisible by N_PARTITIONS).
Use np.random.permutation to permute the array np.arange(len(df)). Then take slices of that array with step N_PARTITIONS, and extract the corresponding rows of your dataframe with .iloc[].
import numpy as np
permuted_indices = np.random.permutation(len(df))
dfs = []
for i in range(N_PARTITIONS):
dfs.append(df.iloc[permuted_indices[i::N_PARTITIONS]])
Since you are on Python 2.7, it might be better to switch range(N_PARTITIONS) by xrange(N_PARTITIONS) to get an iterator instead of a list.
回答3:
Starting with this.
dfm = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo']*2,
'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three']*2})
A B
0 foo one
1 bar one
2 foo two
3 bar three
4 foo two
5 bar two
6 foo one
7 foo three
8 foo one
9 bar one
10 foo two
11 bar three
12 foo two
13 bar two
14 foo one
15 foo three
Usage:
Change "4" to "10", use [i] to get the slices.
np.random.seed(32) # for reproducible results.
np.array_split(dfm.reindex(np.random.permutation(dfm.index)),4)[1]
A B
2 foo two
5 bar two
10 foo two
12 foo two
np.array_split(dfm.reindex(np.random.permutation(dfm.index)),4)[3]
A B
13 foo two
11 bar three
0 foo one
7 foo three
来源:https://stackoverflow.com/questions/38570268/python-pandas-partitioning-a-pandas-dataframe-in-10-disjoint-equally-sized-su