Create distribution in Pandas

瘦欲@ 提交于 2021-02-19 07:34:38

问题


I want to generate a random/simulated data set with a specific distribution.

As an example the distribution has the following properties.

  1. A population of 1000
  2. The Gender mix is: male 49%, female 50%, other 1%
  3. The age has the following distribution: 0-30 (30%), 31-60 (40%), 61-100 (30%)

The resulting data frame would have 1000 rows, and two columns called gender and age (with the above value distributions)

Is there a way to do this in Pandas or another library?


回答1:


You may try:

N = 1000
gender = np.random.choice(["male","female", "other"], size=N, p = [.49,.5,.01])

age = np.r_[np.random.choice(range(30),size= int(.3*N)),
       np.random.choice(range(31,60),size= int(.4*N)),
       np.random.choice(range(61,100),size= N - int(.3*N) - int(.4*N) )]
np.random.shuffle(age)

df = pd.DataFrame({"gender":gender,"age":age})


来源:https://stackoverflow.com/questions/64051466/create-distribution-in-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!