statistics

Calculate the Cumulative Distribution Function (CDF) in Python

吃可爱长大的小学妹 提交于 2019-12-31 10:36:12
问题 How can I calculate in python the Cumulative Distribution Function (CDF)? I want to calculate it from an array of points I have (discrete distribution), not with the continuous distributions that, for example, scipy has. 回答1: (It is possible that my interpretation of the question is wrong. If the question is how to get from a discrete PDF into a discrete CDF, then np.cumsum divided by a suitable constant will do if the samples are equispaced. If the array is not equispaced, then np.cumsum of

Trending algorithm

自作多情 提交于 2019-12-31 10:04:43
问题 I'm working on a micro-forum of sorts, whereby a quick (close to tweet-size) topic message is posted by a special user, which subscribers can respond to with like-sized messages of their own. Straightforward, no 'digging' or voting of any sort, just a chronological flow of responses for each topic message. But with high traffic expected. We would like to flag topic messages according to the response buzz they atract, using a scale of 0 to 10. Been googling for trend algorithms and open source

Perform 2 sample t-test

◇◆丶佛笑我妖孽 提交于 2019-12-31 08:47:06
问题 I have a the mean, std dev and n of sample 1 and sample 2 - samples are taken from the sample population, but measured by different labs. n is different for sample 1 and sample 2. I want to do a weighted (take n into account) two-tailed t-test. I tried using the scipy.stat module by creating my numbers with np.random.normal , since it only takes data and not stat values like mean and std dev (is there any way to use these values directly). But it didn't work since the data arrays has to be of

Separate mixture of gaussians in Python

随声附和 提交于 2019-12-31 08:39:23
问题 There is a result of some physical experiment, which can be represented as a histogram [i, amount_of(i)] . I suppose that result can be estimated by a mixture of 4 - 6 Gaussian functions. Is there a package in Python which takes a histogram as an input and returns the mean and variance of each Gaussian distribution in the mixture distribution? Original data, for example: 回答1: This is a mixture of gaussians, and can be estimated using an expectation maximization approach (basically, it finds

Measures of association in R — Kendall's tau-b and tau-c

你。 提交于 2019-12-31 08:24:08
问题 Are there any R packages for the calculation of Kendall's tau-b and tau-c, and their associated standard errors? My searches on Google and Rseek have turned up nothing, but surely someone has implemented these in R. 回答1: There are three Kendall tau statistics ( tau-a , tau-b , and tau-c ). They are not interchangeable, and none of the answers posted so far deal with the last two, which is the subject of the OP's question. I was unable to find functions to calculate tau-b or tau-c, either in

Representing continuous probability distributions

☆樱花仙子☆ 提交于 2019-12-31 08:12:23
问题 I have a problem involving a collection of continuous probability distribution functions, most of which are determined empirically (e.g. departure times, transit times). What I need is some way of taking two of these PDFs and doing arithmetic on them. E.g. if I have two values x taken from PDF X, and y taken from PDF Y, I need to get the PDF for (x+y), or any other operation f(x,y). An analytical solution is not possible, so what I'm looking for is some representation of PDFs that allows such

Getting mean and standard deviation from groups in a data.frame

徘徊边缘 提交于 2019-12-31 06:25:45
问题 I have heart rate data in the form of a list with the four categories 1AS, 1CS, 1AI, 1CI each of variable size. I would like to output mean and standard deviations for each category in the list. I have the data in this format to calculate ANOVA and Tukey which I have done successfully but the mean has me stumped! Group HR 1 1AS 300 2 1AS 280 3 1AS 260 4 1AS 250 5 1AS 300 6 1AS 272 7 1AS 250 8 1AS 198 9 1AS 200 10 1AS 195 11 1AS 214 12 1AS 249 13 1AS 240 14 1CS 250 15 1CS 236 16 1CS 200 17 1CS

Getting mean and standard deviation from groups in a data.frame

核能气质少年 提交于 2019-12-31 06:24:02
问题 I have heart rate data in the form of a list with the four categories 1AS, 1CS, 1AI, 1CI each of variable size. I would like to output mean and standard deviations for each category in the list. I have the data in this format to calculate ANOVA and Tukey which I have done successfully but the mean has me stumped! Group HR 1 1AS 300 2 1AS 280 3 1AS 260 4 1AS 250 5 1AS 300 6 1AS 272 7 1AS 250 8 1AS 198 9 1AS 200 10 1AS 195 11 1AS 214 12 1AS 249 13 1AS 240 14 1CS 250 15 1CS 236 16 1CS 200 17 1CS

p -value adjustment Mann-Whitney U test in python

末鹿安然 提交于 2019-12-31 05:44:07
问题 I have a two-dimensional list file(name - 'hcl_file'). A shortened version of the file for clarity. Vertical-observations, horizontal-experiment number: ID type First Second Third gerg I 0.02695 0 0.00135 0.31312 11P I 0.02695 0 0.00135 0.31312 112HP II 0.02695 0 0.00135 0.31312 1454HP II 0.02695 0 0.00135 0.31312 11544H III 0.02695 0 0.00135 0.31312 657BF III 0.02695 0 0.00135 0.31312 785DS III 0.02695 0 0.00135 0.31312 I'm new to programming. Could you please tell me how I can calculate the

Why are the parameters of weibull not unique for a given data?

安稳与你 提交于 2019-12-31 05:42:05
问题 I have a data which is the interval days of customers purchasing products. I try to estimate the shape and scale params by scipy.stat.weibull_min But, the parameters returned from the fit function is not unique and when I try to constrain the scale param to be 1, it does not work. Here is the three results with different ways for input: shape, loc, scale = scipy.stats.weibull_min.fit(data,floc=1,scale=1) #constrain scale to be 1 yellow curve loc:1 shape:0.7318249351 scale:75.22852953 shape,