statistics

Create a summary description of a schedule given a list of shifts

不羁的心 提交于 2019-12-28 14:00:07
问题 Assuming I have a list of shifts for an event (in the format start date/time, end date/time) - is there some sort of algorithm I could use to create a generalized summary of the schedule? It is quite common for most of the shifts to fall into some sort of common recurrence pattern (ie. Mondays from 9:00 am to 1:00 pm, Tuesdays from 10:00 am to 3:00 pm, etc). However, there can (and will be) exceptions to this rule (eg. one of the shifts fell on a holiday and was rescheduled for the next day).

Create a summary description of a schedule given a list of shifts

匆匆过客 提交于 2019-12-28 13:58:08
问题 Assuming I have a list of shifts for an event (in the format start date/time, end date/time) - is there some sort of algorithm I could use to create a generalized summary of the schedule? It is quite common for most of the shifts to fall into some sort of common recurrence pattern (ie. Mondays from 9:00 am to 1:00 pm, Tuesdays from 10:00 am to 3:00 pm, etc). However, there can (and will be) exceptions to this rule (eg. one of the shifts fell on a holiday and was rescheduled for the next day).

What is a good solution for calculating an average where the sum of all values exceeds a double's limits?

天涯浪子 提交于 2019-12-28 03:24:05
问题 I have a requirement to calculate the average of a very large set of doubles (10^9 values). The sum of the values exceeds the upper bound of a double, so does anyone know any neat little tricks for calculating an average that doesn't require also calculating the sum? I am using Java 1.5. 回答1: You can calculate the mean iteratively. This algorithm is simple, fast, you have to process each value just once, and the variables never get larger than the largest value in the set, so you won't get an

Using cbind on an arbitrarily long list of objects

元气小坏坏 提交于 2019-12-28 02:58:29
问题 I would like to find a way to create a data.frame by using cbind() to join together many separate objects. For example, if A, B, C & D are all vectors of equal length, one can create data.frame ABCD with ABCD <- cbind(A,B,C,D) However, when the number of objects to be combined gets large, it becomes tedious to type out all of their names. Furthermore, Is there a way to call cbind() on a vector of object names, e.g. objs <- c("A", "B", "C", "D") ABCD <- cbind(objs) or on a list containing all

How to calculate the statistics “t-test” with numpy

依然范特西╮ 提交于 2019-12-28 02:51:48
问题 I'm looking to generate some statistics about a model I created in python. I'd like to generate the t-test on it, but was wondering if there was an easy way to do this with numpy/scipy. Are there any good explanations around? For example, I have three related datasets that look like this: [55.0, 55.0, 47.0, 47.0, 55.0, 55.0, 55.0, 63.0] Now, I would like to do the student's t-test on them. 回答1: In a scipy.stats package there are few ttest_... functions. See example from here: >>> print 't

Compute a confidence interval from sample data

两盒软妹~` 提交于 2019-12-27 11:50:41
问题 I have sample data which I would like to compute a confidence interval for, assuming a normal distribution. I have found and installed the numpy and scipy packages and have gotten numpy to return a mean and standard deviation (numpy.mean(data) with data being a list). Any advice on getting a sample confidence interval would be much appreciated. 回答1: import numpy as np import scipy.stats def mean_confidence_interval(data, confidence=0.95): a = 1.0 * np.array(data) n = len(a) m, se = np.mean(a)

I need to run a fair dice simulation multiple times

时光怂恿深爱的人放手 提交于 2019-12-25 18:31:21
问题 I have a program that simulates a dice roll 100 times. I need to know how to run this program 10^5 times, I think it is something to do with numeric. set.seed(123) x <- sample(1:6, size=100, replace = TRUE) hist(x, main="10^6 fair rolls", xlab = "Dice Result", ylab = "Probability", xlim=c(0.5,6.5), breaks=-1:100+.5, prob=TRUE ) 回答1: As suggested by @markus, you can use replicate : set.seed(123) nTime <- 10^5 x <- replicate(nTime, sample(1:6, size=100, replace = TRUE)) hist(x, main="10^6 fair

Fisher's and Pearson's test for indepedence

余生颓废 提交于 2019-12-25 14:12:06
问题 In R I have 2 datasets: group1 and group2 . For group 1 I have 10 game_id which is the id of a game, and we have number which is the numbers of times this games has been played in group1 . So if we type group1 we get this output game_id number 1 758565 2 235289 ... 10 87084 For group2 we get game_id number 1 79310 2 28564 ... 10 9048 If I want to test if there is a statistical difference between group1 and group2 for the first 2 game_id I can use Pearson chi-square test. In R I simply create

Stationary Test issue

耗尽温柔 提交于 2019-12-25 08:58:40
问题 I am working with air miles data set and i conducted three different tests to check for stationary in the time series data set Test 1: Using acf and pacf acf(airmiles) pacf(airmiles) After differentiating its seems most of the values lies in significance level now acf(diff(airmiles)) pacf(diff(airmiles)) Test 2: Using adf.test adf.test(airmiles,k=0,alternative = "stationary") Augmented Dickey-Fuller Test data: airmiles Dickey-Fuller = -1.1415, Lag order = 0, p-value = 0.8994 alternative

T-test with grouping variable

不羁岁月 提交于 2019-12-25 05:34:08
问题 I've got a data frame with 36 variables and 74 observations. I'd like to make a two sample paired ttest of 35 variables by 1 grouping variable (with two levels). For example: the data frame contains "age" "body weight" and "group" variables. Now I suppose I can do the ttest for each variable with this code: t.test(age~group) But, is there a way to test all the 35 variables with one code, and not one by one? 回答1: An example data frame: dat <- data.frame(age = rnorm(10, 30), body = rnorm(10, 30