Using Pandas to sample DataFrame using a specific column's weight

夙愿已清 提交于 2021-02-18 11:43:27

问题


I have a DataFrame which look like:

  index  name   city
  0      Yam    Hadera
  1      Meow   Hadera
  2      Don    Hadera
  3      Jazz   Hadera
  4      Bond   Tel Aviv
  5      James  Tel Aviv

I want Pandas to randomly choose values, using the number of appearances in the city column (kind of using: df.city.value_counts()), so the results of my magic function, suppose:

df.magic_sample(3, weight_column='city')

might look like:

  0     Yam      Hadera
  1     Meow     Hadera
  2     Bond     Tel Aviv

Thanks! :)


回答1:


You can group by city and then sample each group based on their length compared to the length of the original data frame:

df.groupby('city', group_keys=False).apply(lambda g: g.sample(3 * len(g)/len(df)))




回答2:


If I understand the question correctly, maybe you are looking for random.sample:

>>> import pandas as pd
>>> from random import sample
>>> df = pd.DataFrame(data=[('Yam', 'Hadera'), ('Meow', 'Hadera'), ('Don', 'Hadera'), ('Jazz', 'Hadera'), ('Bond', 'Tel Aviv'), ('James', 'Tel Aviv')], columns=('name', 'city'))
>>> df
    name      city
0    Yam    Hadera
1   Meow    Hadera
2    Don    Hadera
3   Jazz    Hadera
4   Bond  Tel Aviv
5  James  Tel Aviv
>>> df.iloc[sample(range(len(df)), 3), :]
   name      city
4  Bond  Tel Aviv
0   Yam    Hadera
1  Meow    Hadera


来源:https://stackoverflow.com/questions/41528513/using-pandas-to-sample-dataframe-using-a-specific-columns-weight

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!