probability | 易学教程

How to return a node, uniformly at random, from a binary search tree?

阅读更多关于 How to return a node, uniformly at random, from a binary search tree?

问题 Given a BST (may or may not be balanced) how can one return "any" node uniformly at random? A constraint is you cannot use an external indexing data structure. You must traverse the tree in such a manner that every node has an equal chance of being visited. This question has me perplexed for quite a while. If we can indeed use an external hashtable/pointers we could just randomize on those and return the corresponding node. However, my colleague has put forth a rather complex variant of the

Iteration performance

阅读更多关于 Iteration performance

问题 I made a function to evaluate the following problem experimentally, taken from a A Primer for the Mathematics of Financial Engineering. Problem : Let X be the number of times you must flip a fair coin until it lands heads. What are E[X] (expected value) and var(X) (variance)? Following the textbook solution, the following code yields the correct answer: from sympy import * k = symbols('k') Expected_Value = summation(k/2**k, (k, 1, oo)) # Both solutions work Variance = summation(k**2/2**k, (k,

Plotting fit of lognormal distribution after fit by scipy using seaborn

阅读更多关于 Plotting fit of lognormal distribution after fit by scipy using seaborn

问题 I have fit a distribution to my data using scipy.stats.lognormal , and now I am trying to plot the distribution. I have generated the fit to my data with seaborn: ax = sns.distplot(1 - clint_unique_cov_filter['Identity'], kde=False, hist=True, norm_hist=True, fit=lognorm, bins=np.linspace(0, 1, 500)) ax.set_xlim(0, 0.1) Which gets me the fit I expect: I need to use the parameters of this distribution for further analysis, but first I wanted to verify I understood the terms. This post shows me

Use of column inside sum() function using dplyr's mutate() function

阅读更多关于 Use of column inside sum() function using dplyr's mutate() function

问题 I have a data frame and I want to create a new column prob using dplyr's mutate() function. prob should include the probability P(row value > all column values) that there are rows of greater value in the data frame than each row value. Here is what I want to do: data = data.frame(value = c(1,2,3,3,4,4,4,5,5,6,7,8,8,8,8,8,9)) require(dplyr) data %>% mutate(prob = sum(value < data$value) / nrow(data)) This gives the following results: value prob 1 1 0 2 2 0 3 3 0 4 3 0 ... ... ... Here prob

Computationally simple pseudo-Gaussian distribution with varying mean and standard deviation?

阅读更多关于 Computationally simple pseudo-Gaussian distribution with varying mean and standard deviation?

问题 This picture from Wikipedia has a nice example of the sort of functions I'd ideally like to generate: Right now I'm using the Irwin-Hall Distribution, which is more or less a polynomial approximation of the Gaussian distribution...basically, you use uniform random number generator and iterate it x times, and take the average. The more iterations, the more like a Gaussian Distribution it is. It's pretty nice; however I'd like to be able to have one where I can vary the mean. For example, let's

Weighted random map

阅读更多关于 Weighted random map

问题 Suppose I have a big 2D array of values in the range [0,1] where 0 means "impossible" and 1 means "highly probable". How can I select a random set of points in this array according to the probabilities described above ? 回答1: One way to look at the problem is to ignore (for the moment) the fact that you're dealing with a 2d grid. What you have are a set of weighted items. A standard way of randomly selecting from such a set is to: sum the weights, call the sum s select a uniform random value 0

Sampling from a multivariate probability density function in python

阅读更多关于 Sampling from a multivariate probability density function in python

问题 I have a multivariate probability density function P(x,y,z), and I want to sample from it. Normally, I would use numpy.random.choice() for this sort of task, but this function only works for 1-dimensional probability densities. Is there an equivalent function for multivariate pdfs? 回答1: There a few different paths one can follow here. (1) If P(x,y,z) factors as P(x,y,z) = P(x) P(y) P(z) (i.e., x, y, and z are independent) then you can sample each one separately. (2) If P(x,y,z) has a more

Find item in array using weighed probability and a value

阅读更多关于 Find item in array using weighed probability and a value

问题 Last week I had some problems with a simple program I am doing and somebody here helped me. Now I have run into another problem. I currently have this code: var findItem = function(desiredItem) { var items = [ { item: "rusty nail", probability: 0.25 }, { item: "stone", probability: 0.23 }, { item: "banana", probability: 0.20 }, { item: "leaf", probability: 0.17 }, { item: "mushroom", probability: 0.10 }, { item: "diamond", probability: 0.05 } ]; var possible = items.some( ({item, probability}

Expected worst-case time complexity of chained hash table lookups?

阅读更多关于 Expected worst-case time complexity of chained hash table lookups?

问题 When implementing a hash table using a good hash function (one where the probability of any two elements colliding is 1 / m, where m is the number of buckets), it is well-known that the average-case running time for looking up an element is Θ(1 + α), where α is the load factor. The worst-case running time is O(n), though, if all the elements end up put into the same bucket. I was recently doing some reading on hash tables and found this article which claims (on page 3) that if α = 1, the

Given a covarince matrix, generate a Gaussian random variable in Matlab

阅读更多关于 Given a covarince matrix, generate a Gaussian random variable in Matlab

问题 Given a M x M desired covariance, R , and a desired number of sample vectors, N calculate a N x M Gaussian random vector, X in vanilla MATLAB (i.e. can't use r = mvnrnd(MU,SIGMA,cases) ). Not really sure how to tackle this, usually you need a covariance AND mean to generate a Gaussian random variable. I think sqrtm and chol could be useful. 回答1: If you have access to the MATLAB statistics toolbox you can type edit mvnrnd in MATLAB to see their solution. [T p] = chol(sigma); if m1 == c mu = mu