Pandas: create new column in df with random integers from range

痞子三分冷 提交于 2019-12-31 09:06:07

问题


I have a pandas data frame with 50k rows. I'm trying to add a new column that is a randomly generated integer from 1 to 5.

If I want 50k random numbers I'd use:

df1['randNumCol'] = random.sample(xrange(50000), len(df1))

but for this I'm not sure how to do it.

Side note in R, I'd do:

sample(1:5, 50000, replace = TRUE)

Any suggestions?


回答1:


One solution is to use numpy.random.randint:

import numpy as np
df1['randNumCol'] = np.random.randint(1, 6, df1.shape[0])

Or if the numbers are non-consecutive (albeit slower), you can use this:

df1['randNumCol'] = np.random.choice([1, 9, 20], df1.shape[0])

In order to make the results reproducible you can set the seed with numpy.random.seed.




回答2:


To add a column of random integers, use randint(low, high, size). There's no need to waste memory allocating range(low, high); that could be a lot of memory if high is large.

df1['randNumCol'] = np.random.randint(0,5, size=len(df1))

(Note also that when we're just adding a single column, size is just an integer. In general if we want to generate an array/dataframe of randint()s, size can be a tuple, as in Pandas: How to create a data frame of random integers?)



来源:https://stackoverflow.com/questions/30327417/pandas-create-new-column-in-df-with-random-integers-from-range

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!