probability

How to compute the probability of a value given a list of samples from a distribution in Python?

雨燕双飞 提交于 2019-11-28 17:06:45
Not sure if this belongs in statistics, but I am trying to use Python to achieve this. I essentially just have a list of integers: data = [300,244,543,1011,300,125,300 ... ] And I would like to know the probability of a value occurring given this data. I graphed histograms of the data using matplotlib and obtained these: In the first graph, the numbers represent the amount of characters in a sequence. In the second graph, it's a measured amount of time in milliseconds. The minimum is greater than zero, but there isn't necessarily a maximum. The graphs were created using millions of examples,

Calculate probability in normal distribution given mean, std in Python

半腔热情 提交于 2019-11-28 16:37:20
How to calculate probability in normal distribution given mean, std in Python? I can always explicitly code my own function according to the definition like the OP in this question did: Calculating Probability of a Random Variable in a Distribution in Python Just wondering if there is a library function call will allow you to do this. In my imagine it would like this: nd = NormalDistribution(mu=100, std=12) p = nd.prob(98) There is a similar question in Perl: How can I compute the probability at a point given a normal distribution in Perl? . But I didn't see one in Python. Numpy has a random

Select random row from a PostgreSQL table with weighted row probabilities

杀马特。学长 韩版系。学妹 提交于 2019-11-28 11:57:51
Example input: SELECT * FROM test; id | percent ----+---------- 1 | 50 2 | 35 3 | 15 (3 rows) How would you write such query, that on average 50% of time i could get the row with id=1, 35% of time row with id=2, and 15% of time row with id=3? I tried something like SELECT id FROM test ORDER BY p * random() DESC LIMIT 1 , but it gives wrong results. After 10,000 runs I get a distribution like: {1=6293, 2=3302, 3=405} , but I expected the distribution to be nearly: {1=5000, 2=3500, 3=1500} . Any ideas? This should do the trick: WITH CTE AS ( SELECT random() * (SELECT SUM(percent) FROM YOUR_TABLE

how to implement non uniform probability distribution?

纵饮孤独 提交于 2019-11-28 10:34:57
I am trying to implement non-uniform probability distribution in genetic algorithm. In the implementation of genetic program, I have an experiment which has 3 outcomes, where each outcome has different probabilities. Let say, probablity of one outcome is 0.85, other is 0.01 and last one is 0.14? P.S: i recently came to know that it is called non-uniform distribution of probability. I'm implementing it in Java, can anyone tell the theory behind non-uniform prob. distribution & also any Java packages implementing it. Feel free to ask me know, if u need any more information on the problem! Thanks

Find the probability density of a new data point using “density” function in R

大憨熊 提交于 2019-11-28 10:11:23
I am trying to find the best PDF of a continuous data that has unknown distribution, using the "density" function in R. Now, given a new data point, I want to find the probability density of this data point based on the kernel density estimator that I have from the "density" function result. How can I do that? If your new point will be within the range of values produced by density , it's fairly easy to do -- I'd suggest using approx (or approxfun if you need it as a function) to handle the interpolation between the grid-values. Here's an example: set.seed(2937107) x <- rnorm(10,30,3) dx <-

what is the most efficient way to pick a random card from a deck when some cards are unusable?

对着背影说爱祢 提交于 2019-11-28 08:50:17
I have an array which tells whether a card is in use: int used[52]; This is a terrible way to pick a random card if I have many used cards: do { card = rand() % 52; } while (used[card]); since if I have only 3-4 unused cards, it'll take forever to find them. I came up with this: int card; int k = 0; int numUsed = 0; for (k=0; k < 52; ++k) { if (used[k]) numUsed += 1; } if (numUsed == 52) return -1; card = rand() % (52 - numUsed); for (k=0; k < 52; ++k) { if (used[k]) continue; if (card == 0) return k; card -= 1; } which I guess works better if the deck is full, but works worse when the deck is

Estimating/forecasting download completion time

有些话、适合烂在心里 提交于 2019-11-28 08:24:58
We've all poked fun at the 'X minutes remaining' dialog which seems to be too simplistic, but how can we improve it? Effectively, the input is the set of download speeds up to the current time, and we need to use this to estimate the completion time, perhaps with an indication of certainty, like '20-25 mins remaining' using some Y% confidence interval. Code that did this could be put in a little library and used in projects all over, so is it really that difficult? How would you do it? What weighting would you give to previous download speeds? Or is there some open source code already out

Assigning a specific number of values informed by a probability distribution (in R)

£可爱£侵袭症+ 提交于 2019-11-28 07:38:06
问题 Hello and thanks in advance for the help! I am trying to generate a vector with a specific number of values that are assigned according to a probability distribution. For example, I want a vector of length 31, contained 26 zeroes and 5 ones. (The total sum of the vector should always be five.) However, the location of the ones is important. And to identify which values should be one and which should be zero, I have a vector of probabilities (length 31), which looks like this: probs<-c(0.01,0

Generating a probability distribution

自古美人都是妖i 提交于 2019-11-28 06:54:27
Given an array of size n I want to generate random probabilities for each index such that Sigma(a[0]..a[n-1])=1 One possible result might be: 0 1 2 3 4 0.15 0.2 0.18 0.22 0.25 Another perfectly legal result can be: 0 1 2 3 4 0.01 0.01 0.96 0.01 0.01 How can I generate these easily and quickly? Answers in any language are fine, Java preferred. The task you are trying to accomplish is tantamount to drawing a random point from the N-dimensional unit simplex. http://en.wikipedia.org/wiki/Simplex#Random_sampling might help you. A naive solution might go as following: public static double[] getArray

Efficiently determining the probability of a user clicking a hyperlink

淺唱寂寞╮ 提交于 2019-11-28 06:47:27
问题 So I have a bunch of hyperlinks on a web page. From past observation I know the probabilities that a user will click on each of these hyperlinks. I can therefore calculate the mean and standard deviation of these probabilities. I now add a new hyperlink to this page. After a short amount of testing I find that of the 20 users that see this hyperlink, 5 click on it. Taking into account the known mean and standard deviation of the click-through probabilities on other hyperlinks (this forms a