statistics | 易学教程

Python wilcoxon: unequal N

阅读更多关于 Python wilcoxon: unequal N

问题 Rs wilcox.test can take different length vectors, but the wilcoxon from scipy.stats cannot: I get an unequal N error message. from scipy.stats import wilcoxon wilcoxon(range(10), range(12)) Is there a way to get Rs behavior in Python? 回答1: According to the R docs: Performs one- and two-sample Wilcoxon tests on vectors of data; the latter is also known as ‘Mann-Whitney’ test. So just use from scipy.stats import mannwhitneyu mannwhitneyu(range(10), range(12)) # (50.0, 0.26494055917435472) 来源：

Which statistics is calculated faster in SAS, proc summary?

阅读更多关于 Which statistics is calculated faster in SAS, proc summary?

问题 I need a theoretical answer. Imagine that you have a table with 1.5 billion rows (the table is created as column-based with DB2-Blu). You are using SAS and you will do some statistics by using Proc Summary like min/max/mean values, standard deviation value and percentile-10, percentile-90 through your peer-groups. For instance, you have 30.000 peer-groups and you have 50.000 values in each peer group (Total 1.5 billions values). The other case you have 3 million peer-groups and also you have

GLM with autoregressive term to correct for serial correlation

阅读更多关于 GLM with autoregressive term to correct for serial correlation

问题 I have a stationary time series to which I want to fit a linear model with an autoregressive term to correct for serial correlation, i.e. using the formula At = c1*Bt + c2*Ct + ut, where ut = r*ut-1 + et (ut is an AR(1) term to correct for serial correlation in the error terms) Does anyone know what to use in R to model this? Thanks Karl 回答1: The GLMMarp package will fit these models. If you just want a linear model with Gaussian errors, you can do it with the arima() function where the

Compute similarity percentage OR Compute correlation between more than 2 objects

阅读更多关于 Compute similarity percentage OR Compute correlation between more than 2 objects

问题 Consider I have four objects ( a,b,c,d ), and I ask five persons to label them (category 1 or 2) according to their physical appearance or something else. The labels provided by five persons for these objects are shown as df <- data.frame(a = c(1,2,1,2,1), b=c(1,2,2,1,1), c= c(2,1,2,2,2), d=c(1,2,1,2,1)) In tabular format, --------- a b c d --------- 1 1 2 1 2 2 1 2 1 2 2 1 2 1 2 2 1 1 2 1 ---------- Now I want to calculate the percentage of times a group of objects were given the same label

How to return significant matches in R corrplot?

阅读更多关于 How to return significant matches in R corrplot?

问题 I would like to return the significant matches from the following result shown in Fig. 1 library("corrplot") M <- cor(mtcars) # http://www.sthda.com/english/wiki/visualize-correlation-matrix-using-correlogram cor.mtest <- function(mat, ...) { mat <- as.matrix(mat) n <- ncol(mat) p.mat<- matrix(NA, n, n) diag(p.mat) <- 0 for (i in 1:(n - 1)) { for (j in (i + 1):n) { tmp <- cor.test(mat[, i], mat[, j], ...) p.mat[i, j] <- p.mat[j, i] <- tmp$p.value } } colnames(p.mat) <- rownames(p.mat) <-

circularly symmetric Gaussian variables using matlab

阅读更多关于 circularly symmetric Gaussian variables using matlab

问题 any one can help me, i want to generate a matrix with elements being zero mean and unit variance independent and identically distributed (i.i.d.) circularly symmetric Gaussian variables using Matlab any one know the code for this and how to do it 回答1: It is easy to generate a matrix with elements being zero mean and unit variance by using this command in matlab: normrnd(mu, sigma) mu is the mean sigma is the standard deviation. More detail please help normrnd in MATLAB. 来源： https:/

How to change labels (legends) in ggplot?

阅读更多关于 How to change labels (legends) in ggplot?

问题 My code is like below, I want to change the label of the ggplot, but R always remind me: Error in unit(tic_pos.c, "mm") : 'x' and 'units' must have length > 0 What should I do? ggplot(mat,aes(x=sales,col=type))+ geom_density()+labels("red_sold","blue_sold","yellow_sold") 回答1: Is mat$type a factor? If not, that will cause the error. Also, you can't use labels(...) this way. Since you did not provide any data, here's an example using the built-in mtcars dataset. ggplot(mtcars, aes(x=hp,color

convert data frame to time series in R

阅读更多关于 convert data frame to time series in R

问题 I have monthly data fro last two and half year. I want to convert my data frame to time series. So that I should have Start :: 2015-01-01 End :: 2017-06-01 Frequency : 1 I have tried ts (df [, -1], start = df [1, 1], end = df [29, 1]) But I get this really wired output from this. Time Series: Start = 16436 End = 17287 Frequency = 1 date inflow 1 2015-01-01 6434 2 2015-02-01 5595 3 2015-03-01 3101 4 2015-04-01 3475 5 2015-05-01 6519 6 2015-06-01 7251 7 2015-07-01 4200 8 2015-08-01 3622 9 2015

iOS5, iOS4, … statistics? [duplicate]

阅读更多关于 iOS5, iOS4, … statistics? [duplicate]

问题 This question already has answers here : Breakdown of iOS versions being used [closed] (3 answers) Closed 2 years ago . I'm looking for usage statistics of Apples iOS. Something like that http://www.w3schools.com/browsers/browsers_stats.asp for browser usage. Does anyone know a source? 回答1: You can go to http://marketshare.hitslink.com/, view mobile browsers by version, then infer the iOS version market share from the version of Mobile Safari being reported. But keep in mind that it's not a

doing t.test for columns for each row in data set

阅读更多关于 doing t.test for columns for each row in data set

问题 I have a set of data x which consists of 12 columns and 167 rows. The first column is compound Id for each row. I want to run a t.test for 3 column as one group and the other 3 groups as the second group, separately for each row. My code is as below but it does not work. for (i in 1:nrow(x)) { function(i)c(compound=i, t.test(x[2:4],x[8:10], x[x$compound==i, ], alternative='two.sided',conf.level=0.95) ) } print(c(compound=i,t.test(x[2:4],x[8:10],x[x$compound==i,], alternative='two.sided',conf