Combining two normal random variables

问题

suppose I have the following 2 random variables :

X where mean = 6 and stdev = 3.5
Y where mean = -42 and stdev = 5

I would like to create a new random variable Z based on the first two and knowing that : X happens 90% of the time and Y happens 10% of the time.

It is easy to calculate the mean for Z : 0.9 * 6 + 0.1 * -42 = 1.2

But is it possible to generate random values for Z in a single function? Of course, I could do something along those lines :

if (randIntBetween(1,10) > 1)
    GenerateRandomNormalValue(6, 3.5);
else
    GenerateRandomNormalValue(-42, 5);

But I would really like to have a single function that would act as a probability density function for such a random variable (Z) that is not necessary normal.

sorry for the crappy pseudo-code

Thanks for your help!

Edit : here would be one concrete interrogation :

Let's say we add the result of 5 consecutives values from Z. What would be the probability of ending with a number higher than 10?

回答1:

But I would really like to have a single function that would act as a probability density function for such a random variable (Z) that is not necessary normal.

Okay, if you want the density, here it is:

rho = 0.9 * density_of_x + 0.1 * density_of_y

But you cannot sample from this density if you don't 1) compute its CDF (cumbersome, but not infeasible) 2) invert it (you will need a numerical solver for this). Or you can do rejection sampling (or variants, eg. importance sampling). This is costly, and cumbersome to get right.

So you should go for the "if" statement (ie. call the generator 3 times), except if you have a very strong reason not to (using quasi-random sequences for instance).

回答2:

If a random variable is denoted x=(mean,stdev) then the following algebra applies

number * x = ( number*mean, number*stdev )

x1 + x2 = ( mean1+mean2, sqrt(stdev1^2+stdev2^2) )

so for the case of X = (mx,sx), Y= (my,sy) the linear combination is

Z = w1*X + w2*Y = (w1*mx,w1*sx) + (w2*my,w2*sy) = 
    ( w1*mx+w2*my, sqrt( (w1*sx)^2+(w2*sy)^2 ) ) =
    ( 1.2, 3.19 )

link: Normal Distribution look for Miscellaneous section, item 1.

PS. Sorry for the wierd notation. The new standard deviation is calculated by something similar to the pythagorian theorem. It is the square root of the sum of squares.

回答3:

This is the form of the distribution:

ListPlot[BinCounts[Table[If[RandomReal[] < .9,
    RandomReal[NormalDistribution[6, 3.5]], 
    RandomReal[NormalDistribution[-42, 5]]], {1000000}], {-60, 20, .1}], 
    PlotRange -> Full, DataRange -> {-60, 20}]

It is NOT Normal, as you are not adding Normal variables, but just choosing one or the other with certain probability.

Edit

This is the curve for adding five vars with this distribution:

The upper and lower peaks represent taking one of the distributions alone, and the middle peak accounts for the mixing.

回答4:

The most straightforward and generically applicable solution is to simulate the problem:

Run the piecewise function you have 1,000,000 (just a high number) of times, generate a histogram of the results (by splitting them into bins, and divide the count for each bin by your N (1,000,000 in my example). This will leave you with an approximation for the PDF of Z at every given bin.

回答5:

Lots of unknowns here, but essentially you just wish to add the two (or more) probability functions to one another.

For any given probability function you could calculate a random number with that density by calculating the area under the probability curve (the integral) and then generating a random number between 0 and that area. Then move along the curve until the area is equal to your random number and use that as your value.

This process can then be generalized to any function (or sum of two or more functions).

Elaboration: If you have a distribution function f(x) which ranges from 0 to 1. You could calculate a random number based on the distribution by calculating the integral of f(x) from 0 to 1, giving you the area under the curve, lets call it A.

Now, you generate a random number between 0 and A, let's call that number, r. Now you need to find a value t, such that the integral of f(x) from 0 to t is equal to r. t is your random number.

This process can be used for any probability density function f(x). Including the sum of two (or more) probability density functions.

I'm not sure what your functions look like, so not sure if you are able to calculate analytic solutions for all this, but worse case scenario, you could use numeric techniques to approximate the effect.

来源：https://stackoverflow.com/questions/4454513/combining-two-normal-random-variables

标签

math

statistics

normal-distribution