distribution

Python: Sample from multivariate normal with N means and same covariance matrix

白昼怎懂夜的黑 提交于 2021-02-19 08:14:11
问题 Suppose I want to sample 10 times from multiple normal distributions with the same covariance matrix (identity) but different means, which are stored as rows of the following matrix: means = np.array([[1, 5, 2], [6, 2, 7], [1, 8, 2]]) How can I do that in the most efficient way possible (i.e. avoiding loops) I tried like this: scipy.stats.multivariate_normal(means, np.eye(2)).rvs(10) and np.random.multivariate_normal(means, np.eye(2)) But they throw an error saying mean should be 1D. Slow

Create distribution in Pandas

瘦欲@ 提交于 2021-02-19 07:34:38
问题 I want to generate a random/simulated data set with a specific distribution. As an example the distribution has the following properties. A population of 1000 The Gender mix is: male 49%, female 50%, other 1% The age has the following distribution: 0-30 (30%), 31-60 (40%), 61-100 (30%) The resulting data frame would have 1000 rows, and two columns called gender and age (with the above value distributions) Is there a way to do this in Pandas or another library? 回答1: You may try: N = 1000

When is it (not) appropriate to bundle dependencies with an application?

放肆的年华 提交于 2021-02-18 10:07:22
问题 Summary I recently had a conversation with the creator of a framework that one of my applications depends on. During that conversation he mentioned as a sort of aside that it would make my life simpler if I just bundled his framework with my application and delivered to the end user a version that I knew was consistent with my code. Intuitively I have always tried to avoid doing this and, in fact, I have taken pains to segment my own code so that portions of it could be redistributed without

Weibull distribution with weighted data

半世苍凉 提交于 2021-02-11 13:31:08
问题 I have some time to event data that I need to generate around 200 shape/scale parameters for subgroups for a simulation model. I have analysed the data, and it best follows a weibull distribution. Normally, I would use the fitdistrplus package and fitdist(x, "weibull") to do so, however this data has been matched using kernel matching and I have a variable of weighting values called km and so needs to incorporate a weight, which isn't something fitdist can do as far as I can tell. With my

Weibull distribution with weighted data

前提是你 提交于 2021-02-11 13:31:04
问题 I have some time to event data that I need to generate around 200 shape/scale parameters for subgroups for a simulation model. I have analysed the data, and it best follows a weibull distribution. Normally, I would use the fitdistrplus package and fitdist(x, "weibull") to do so, however this data has been matched using kernel matching and I have a variable of weighting values called km and so needs to incorporate a weight, which isn't something fitdist can do as far as I can tell. With my

Card distribution with constraints

拟墨画扇 提交于 2021-02-10 08:07:00
问题 Suppose I would like to distribute a deck of 52 cards unto N players , not necessarily equally so each player Pi would get a number of cards Ci . Suppose that each of these players might have constraints that dictate what cards (s)he can receive, for example Player P2 cannot get any cards in the Hearts color and P5 cannot get any cards above 10. All these constraints are guaranteed to have at least one distribution/solution. My main question is how would one go about this programmatically?

Card distribution with constraints

六眼飞鱼酱① 提交于 2021-02-10 08:05:34
问题 Suppose I would like to distribute a deck of 52 cards unto N players , not necessarily equally so each player Pi would get a number of cards Ci . Suppose that each of these players might have constraints that dictate what cards (s)he can receive, for example Player P2 cannot get any cards in the Hearts color and P5 cannot get any cards above 10. All these constraints are guaranteed to have at least one distribution/solution. My main question is how would one go about this programmatically?

Why is the first bar so big in my R histogram?

你。 提交于 2021-02-09 08:44:05
问题 I'm playing around with R. I try to visualize the distribution of 1000 dice throws with the following R script: cases <- 1000 min <- 1 max <- 6 x <- as.integer(runif(cases,min,max+1)) mx <- mean(x) sd <- sd(x) hist( x, xlim=c(min - abs(mx/2),max + abs(mx/2)), main=paste(cases,"Samples"), freq = FALSE, breaks=seq(min,max,1) ) curve(dnorm(x, mx, sd), add = TRUE, col="blue", lwd = 2) abline(v = mx, col = "red", lwd = 2) legend("bottomleft", legend=c(paste('Mean (', mx, ')')), col=c('red'), lwd=2

Inconsistent skewness results between basic skewness formula, Python and R

白昼怎懂夜的黑 提交于 2021-02-08 07:54:47
问题 The data I'm using is pasted below. When I apply the basic formula for skewness to my data in R: 3*(mean(data) - median(data))/sd(data) The result is -0.07949198. I get a very similar result in Python. The median is therefore greater than the mean suggesting the left tail is longer. However, when I apply the descdist function from the fitdistrplus package, the skewness is 0.3076471 suggesting the right tail is longer. The Scipy function skew again returns a skewness of 0.303. Can I trust this

Calculating loglikelihood of distributions in Python

≯℡__Kan透↙ 提交于 2021-02-08 07:44:27
问题 What is an easy way to calculate the loglikelihood of any distribution fitted to data? 回答1: Solution by OP. Python has 82 standard distributions which can be found here and in scipy.stats.distributions Suppose you find the parameters such that the probability density function(pdf) fits the data as follows: dist = getattr(stats.stats, 'distribution name') params = dist.fit(data) Then since it is a standard distribution included in the SciPy library, the pdf and logpdf can be found and used