statistics

Obtaining absolute deviation from mean for two sets of scores

∥☆過路亽.° 提交于 2020-07-03 05:09:04
问题 To obtain absolute deviation from the mean for two groups of scores, I usually need to write long codes in R such as the ones shown below. Question I was wondering if it might be possible in BASE R to somehow Vectorize the mad() function so that the absolute deviation from the mean scores for each group of scores in the example I'm showing below could be obtained using that Vectorized version of mad() ? Any other workable ideas are highly appreciated? set.seed(0) y = as.vector(unlist(mapply

How to add the spearman correlation p value along with correlation coefficient to ggpairs?

旧巷老猫 提交于 2020-06-28 06:51:50
问题 Constructing a ggpairs figure in R using the following code. df is a dataframe containing 6 continuous variables and one Group variable ggpairs(df[,-1],columns = 1:ncol(df[,-1]), mapping=ggplot2::aes(colour = df$Group),legends = T,axisLabels = "show", upper = list(continuous = wrap("cor", method = "spearman", size = 2.5, hjust=0.7)))+ theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.line = element_line(colour = "black")) I am trying to add the p-value of

How to add the spearman correlation p value along with correlation coefficient to ggpairs?

 ̄綄美尐妖づ 提交于 2020-06-28 06:51:44
问题 Constructing a ggpairs figure in R using the following code. df is a dataframe containing 6 continuous variables and one Group variable ggpairs(df[,-1],columns = 1:ncol(df[,-1]), mapping=ggplot2::aes(colour = df$Group),legends = T,axisLabels = "show", upper = list(continuous = wrap("cor", method = "spearman", size = 2.5, hjust=0.7)))+ theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.line = element_line(colour = "black")) I am trying to add the p-value of

parameterization of the negative binomial in scipy via mean and std

懵懂的女人 提交于 2020-06-28 05:16:35
问题 I am trying to fit my data to a Negative Binomial Distribution with the package scipy in Python . However, my validation seems to fail. These are my steps: I have some demand data which is described by the statistics: mu = 1.4 std = 1.59 print(mu, std) I use the parameterization function below, taken from this post to compute the two NB parameters. def convert_params(mu, theta): """ Convert mean/dispersion parameterization of a negative binomial to the ones scipy supports See https://en

How to compute the Topological Overlap Measure [TOM] for a weighted adjacency matrix in Python?

萝らか妹 提交于 2020-06-25 02:41:21
问题 I'm trying to calculate the weighted topological overlap for an adjacency matrix but I cannot figure out how to do it correctly using numpy . The R function that does the correct implementation is from WGCNA (https://www.rdocumentation.org/packages/WGCNA/versions/1.67/topics/TOMsimilarity). The formula for computing this (I THINK) is detailed in equation 4 which I believe is correctly reproduced below. Does anyone know how to implement this correctly so it reflects the WGCNA version? Yes, I

R generate all possible interaction variables

送分小仙女□ 提交于 2020-06-24 12:53:51
问题 I have a dataframe with variables, say a,b,c,d dat <- data.frame(a=runif(1e5), b=runif(1e5), c=runif(1e5), d=runif(1e5)) and would like to generate all possible two-way interaction terms between each of the columns, that is: ab, ac, ad, bc, bd, cd. In reality my dataframe has over 100 columns, so I cannot code this manually. What is the most efficient way to do this (noting that I do not want both a b and b a)? 回答1: What do you plan to do with all these interaction terms? There are several

Interpreting scipy.stats.entropy values

天涯浪子 提交于 2020-06-24 07:47:33
问题 I am trying to use scipy.stats.entropy to estimate the Kullback–Leibler (KL) divergence between two distributions. More specifically, I would like to use the KL as a metric to decide how consistent two distributions are. However, I cannot interpret the KL values. For ex: t1=numpy.random.normal(-2.5,0.1,1000) t2=numpy.random.normal(-2.5,0.1,1000) scipy.stats.entropy(t1,t2) 0.0015539217193737955 Then, t1=numpy.random.normal(-2.5,0.1,1000) t2=numpy.random.normal(2.5,0.1,1000) scipy.stats.entropy

How can we perform common set operations (union, intersection, minus) in MS Excel?

岁酱吖の 提交于 2020-06-22 12:11:47
问题 For example, I have an xls where : column A has list of items with property A column B has list of items with property B I need the following : column C which is A union B (unique items of both A & B) column D which is A intersection B (common items of A & B) column E which is A minus B (items in A but not in B) column F which is B minus A (items in B but not in A) Set operations on a list of elements seem to be easy with SQL or Python. But how to do it in xls? Note : It should be an

Pandas quantiles misbehaving by… getting smaller partway through a range of percentiles?

戏子无情 提交于 2020-06-17 15:50:55
问题 Short version Running df2.groupby("EquipmentType").quantile([.1, .25, .5, .75,0.9,0.95,0.99]) on a dataset is sometimes giving me percentiles that appear to reset partway through my data. Why is this, and how can I avoid it? Full version of code (but not the data) at the end. Loaders 0.10 57.731806 0.25 394.004375 0.50 0.288889 0.75 7.201528 0.90 51.015667 0.95 83.949833 0.99 123.148019 Full version I'm working through a large dataset (on the order of 2,500,000 rows) of equipment failure data

What is the use of base_score in xgboost multiclass working?

断了今生、忘了曾经 提交于 2020-06-17 09:33:40
问题 The bounty expires in 7 days . Answers to this question are eligible for a +100 reputation bounty. jared_mamrot is looking for an answer from a reputable source : This bounty is for a reproducible example illustrating the application of base_score or base_margin to a multiclass XGBoost classification problem (softmax or softprob) using R. I am trying to explore the working of Xgboost binary classification as well as for multi-class. In case of binary class, i observed that base_score is