statistics

Calling rnorm with a vector of means

若如初见. 提交于 2019-12-18 12:59:10
问题 When I call rnorm passing a single value as mean, it's obvious what happens: a value is generated from Normal(10,1). y <- rnorm(20, mean=10, sd=1) But, I see examples of a whole vector being passed to rnorm (or rcauchy , etc..); in this case, I am not sure what the R machinery really does. For example: a = c(10,22,33,44,5,10,30,22,100,45,97) y <- rnorm(a, mean=a, sd=1) Any ideas? 回答1: The number of random numbers rnorm generates equals the length of a. From ?rnorm : n: number of observations.

Using shapiro.test on multiple columns in a data frame

社会主义新天地 提交于 2019-12-18 12:06:09
问题 It seems like a pretty simple question, but I can't find the answer. I have a dataframe (lets call it df ), containing n=100 columns ( C1 , C2 ,..., C100 ) and 50 rows ( R1 , R2 ,..., R50 ). I tested all the column in the data frame to be sure they are numeric. I want to know if the data in each columns has a normal distribution using the shapiro.test() function. I am able to do it column by colums using the code : > shapiro.test(df$Cn) or > shapiro.test(df[,c(Cn)]) However when I try to do

Randomly selecting an element from a weighted list

半世苍凉 提交于 2019-12-18 10:54:50
问题 I have a list of 100,000 objects. Every list element has a "weight" associated with it that is a positive int from 1 to N. What is the most efficient way to select a random element from the list? I want the behavior that my distribution of randomly chosen elements is the same as the distribution of weights in the list. For example, if I have a list L = {1,1,2,5}, I want the 4th element to be selected 5/9ths of the time, on average. Assume inserts and deletes are common on this list, so any

How do I do a F-test in python

戏子无情 提交于 2019-12-18 10:41:28
问题 How do I do an F-test to check if the variance is equivalent in two vectors in Python? For example if I have a = [1,2,1,2,1,2,1,2,1,2] b = [1,3,-1,2,1,5,-1,6,-1,2] is there something similar to scipy.stats.ttest_ind(a, b) I found sp.stats.f(a, b) But it appears to be something different to an F-test 回答1: The test statistic F test for equal variances is simply: F = Var(X) / Var(Y) Where F is distributed as df1 = len(X) - 1, df2 = len(Y) - 1 scipy.stats.f which you mentioned in your question

Graphing perpendicular offsets in a least squares regression plot in R

一笑奈何 提交于 2019-12-18 10:24:19
问题 I'm interested in making a plot with a least squares regression line and line segments connecting the datapoints to the regression line as illustrated here in the graphic called perpendicular offsets: http://mathworld.wolfram.com/LeastSquaresFitting.html (from MathWorld - A Wolfram Web Resource: wolfram.com) I have the plot and regression line done here: ## Dataset from http://www.apsnet.org/education/advancedplantpath/topics/RModules/doc1/04_Linear_regression.html ## Disease severity as a

Fitting data to distributions?

淺唱寂寞╮ 提交于 2019-12-18 10:03:08
问题 I am not a statistician (more of a researchy web developer) but I've been hearing a lot about scipy and R these days. So out of curiosity I wanted to ask this question (though it might sound silly to the experts around here) because I am not sure of the advances in this area and want to know how people without a sound statistics background approach these problems. Given a set of real numbers observed from an experiment, let us say they belong to one of the many distributions out there (like

Student's t distribution in JavaScript for Google Spreadsheet

戏子无情 提交于 2019-12-18 09:18:18
问题 Google Spreadsheets currently does not support the standard function TDIST - i.e. the Student's t-distribution. This function is critical for calculating p-values. It seems that this is related to the fact that no integral-using functions (AFAICT) are implemented either. However, Google Docs allows people to add and publish their own scripts, in JavaScript. So ideally we should have something like: function tdist(t_value, degrees_of_freedom, two_tailed [defaults true]) {...} Anyone know of

Getting the y-axis intercept and slope from a linear regression of multiple data and passing the intercept and slope values to a data frame

非 Y 不嫁゛ 提交于 2019-12-18 08:49:34
问题 I have a data frame x1 , which was generated with the following piece of code, x <- c(1:10) y <- x^3 z <- y-20 s <- z/3 t <- s*6 q <- s*y x1 <- cbind(x,y,z,s,t,q) x1 <- data.frame(x1) I would like to extract the y-axis intercept and the slope of the linear regression fit for the data, x y z s t q 1 1 1 -19 -6.333333 -38 -6.333333 2 2 8 -12 -4.000000 -24 -32.000000 3 3 27 7 2.333333 14 63.000000 4 4 64 44 14.666667 88 938.666667 5 5 125 105 35.000000 210 4375.000000 6 6 216 196 65.333333 392

R geom_tile ggplot2 what kind of stat is applied?

风格不统一 提交于 2019-12-18 06:54:47
问题 I used geom_tile() for plot 3 variables on the same graph... with tile_ruined_coop<-ggplot(data=df.1[sel1,])+ geom_tile(aes(x=bonus, y=malus, fill=rf/300))+ scale_fill_gradient(name="vr")+ facet_grid(Seuil_out_coop_i ~ nb_coop_init) tile_ruined_coop and I am pleased with the result ! But What kind of statistical treatment is applied to fill ? Is this a mean ? 回答1: To plot the mean of the fill values you should aggregate your values, before plotting. The scale_colour_gradient(...) does not

Summarize different Columns with different Functions

风格不统一 提交于 2019-12-18 05:08:13
问题 I have the following Problem: In a data frame I have a lot of rows and columns with the first row being the date. For each date I have more than 1 observation and I want to summarize them. My df looks like that (date replaced by ID for ease of use): df: ID Cash Price Weight ... 1 0.4 0 0 1 0.2 0 82 ... 1 0 1 0 ... 1 0 3.2 80 ... 2 0.3 1 70 ... ... ... ... ... ... I want to group them by the first column and then summarize all rows BUT with different functions: The function Cash and Price