statistics | 易学教程

Create a summary description of a schedule given a list of shifts

阅读更多关于 Create a summary description of a schedule given a list of shifts

问题 Assuming I have a list of shifts for an event (in the format start date/time, end date/time) - is there some sort of algorithm I could use to create a generalized summary of the schedule? It is quite common for most of the shifts to fall into some sort of common recurrence pattern (ie. Mondays from 9:00 am to 1:00 pm, Tuesdays from 10:00 am to 3:00 pm, etc). However, there can (and will be) exceptions to this rule (eg. one of the shifts fell on a holiday and was rescheduled for the next day).

Create a summary description of a schedule given a list of shifts

阅读更多关于 Create a summary description of a schedule given a list of shifts

What is a good solution for calculating an average where the sum of all values exceeds a double's limits?

阅读更多关于 What is a good solution for calculating an average where the sum of all values exceeds a double's limits?

问题 I have a requirement to calculate the average of a very large set of doubles (10^9 values). The sum of the values exceeds the upper bound of a double, so does anyone know any neat little tricks for calculating an average that doesn't require also calculating the sum? I am using Java 1.5. 回答1: You can calculate the mean iteratively. This algorithm is simple, fast, you have to process each value just once, and the variables never get larger than the largest value in the set, so you won't get an

Using cbind on an arbitrarily long list of objects

阅读更多关于 Using cbind on an arbitrarily long list of objects

问题 I would like to find a way to create a data.frame by using cbind() to join together many separate objects. For example, if A, B, C & D are all vectors of equal length, one can create data.frame ABCD with ABCD <- cbind(A,B,C,D) However, when the number of objects to be combined gets large, it becomes tedious to type out all of their names. Furthermore, Is there a way to call cbind() on a vector of object names, e.g. objs <- c("A", "B", "C", "D") ABCD <- cbind(objs) or on a list containing all

How to calculate the statistics “t-test” with numpy

阅读更多关于 How to calculate the statistics “t-test” with numpy

问题 I'm looking to generate some statistics about a model I created in python. I'd like to generate the t-test on it, but was wondering if there was an easy way to do this with numpy/scipy. Are there any good explanations around? For example, I have three related datasets that look like this: [55.0, 55.0, 47.0, 47.0, 55.0, 55.0, 55.0, 63.0] Now, I would like to do the student's t-test on them. 回答1: In a scipy.stats package there are few ttest_... functions. See example from here: >>> print 't

Compute a confidence interval from sample data

阅读更多关于 Compute a confidence interval from sample data

问题 I have sample data which I would like to compute a confidence interval for, assuming a normal distribution. I have found and installed the numpy and scipy packages and have gotten numpy to return a mean and standard deviation (numpy.mean(data) with data being a list). Any advice on getting a sample confidence interval would be much appreciated. 回答1: import numpy as np import scipy.stats def mean_confidence_interval(data, confidence=0.95): a = 1.0 * np.array(data) n = len(a) m, se = np.mean(a)

I need to run a fair dice simulation multiple times

阅读更多关于 I need to run a fair dice simulation multiple times

问题 I have a program that simulates a dice roll 100 times. I need to know how to run this program 10^5 times, I think it is something to do with numeric. set.seed(123) x <- sample(1:6, size=100, replace = TRUE) hist(x, main="10^6 fair rolls", xlab = "Dice Result", ylab = "Probability", xlim=c(0.5,6.5), breaks=-1:100+.5, prob=TRUE ) 回答1: As suggested by @markus, you can use replicate : set.seed(123) nTime <- 10^5 x <- replicate(nTime, sample(1:6, size=100, replace = TRUE)) hist(x, main="10^6 fair

Fisher's and Pearson's test for indepedence

阅读更多关于 Fisher's and Pearson's test for indepedence

问题 In R I have 2 datasets: group1 and group2 . For group 1 I have 10 game_id which is the id of a game, and we have number which is the numbers of times this games has been played in group1 . So if we type group1 we get this output game_id number 1 758565 2 235289 ... 10 87084 For group2 we get game_id number 1 79310 2 28564 ... 10 9048 If I want to test if there is a statistical difference between group1 and group2 for the first 2 game_id I can use Pearson chi-square test. In R I simply create

Stationary Test issue

阅读更多关于 Stationary Test issue

问题 I am working with air miles data set and i conducted three different tests to check for stationary in the time series data set Test 1: Using acf and pacf acf(airmiles) pacf(airmiles) After differentiating its seems most of the values lies in significance level now acf(diff(airmiles)) pacf(diff(airmiles)) Test 2: Using adf.test adf.test(airmiles,k=0,alternative = "stationary") Augmented Dickey-Fuller Test data: airmiles Dickey-Fuller = -1.1415, Lag order = 0, p-value = 0.8994 alternative

T-test with grouping variable

阅读更多关于 T-test with grouping variable

问题 I've got a data frame with 36 variables and 74 observations. I'd like to make a two sample paired ttest of 35 variables by 1 grouping variable (with two levels). For example: the data frame contains "age" "body weight" and "group" variables. Now I suppose I can do the ttest for each variable with this code: t.test(age~group) But, is there a way to test all the 35 variables with one code, and not one by one? 回答1: An example data frame: dat <- data.frame(age = rnorm(10, 30), body = rnorm(10, 30