statistics | 易学教程

How to print the name of current row when using apply in R?

阅读更多关于 How to print the name of current row when using apply in R?

问题 For example, I have a matrix k > k d e a 1 3 b 2 4 I want to apply a function on k > apply(k,MARGIN=1,function(p) {p+1}) a b d 2 3 e 4 5 However, I also want to print the rowname of the row being apply so that I can know which row the function is applied on at that time. It may looks like this: apply(k,MARGIN=1,function(p) {print(rowname(p)); p+1}) But I really don't do how to do that in R. Does anyone has any idea? 回答1: As far as I know you cannot do that with apply , but you could loop

Goodness of fit tests in SciPy

阅读更多关于 Goodness of fit tests in SciPy

问题 I'm new to Python and coming from the R world. I'm trying to fit distributions to sample data using SciPy and having good success. I can make distribution.fit(data) return sane results. What I've been unable to do is create the goodness of fit statistics which I'm used to with the fitdistrplus package in R. Is there a common method for comparing "best fit" from a number of different distributions with SciPy? I'm looking for something like the Kolmogorov-Smirnov test or Cramer-von Mises or

Computing median in map reduce

阅读更多关于 Computing median in map reduce

问题 Can someone example the computation of median/quantiles in map reduce? My understanding of Datafu's median is that the 'n' mappers sort the data and send the data to "1" reducer which is responsible for sorting all the data from n mappers and finding the median(middle value) Is my understanding correct?, if so, does this approach scale for massive amounts of data as i can clearly see the one single reducer struggling to do the final task. Thanks 回答1: Trying to find the median (middle number)

Is there a built-in KL divergence loss function in TensorFlow?

阅读更多关于 Is there a built-in KL divergence loss function in TensorFlow?

问题 I have two tensors, prob_a and prob_b with shape [None, 1000] , and I want to compute the KL divergence from prob_a to prob_b . Is there a built-in function for this in TensorFlow? I tried using tf.contrib.distributions.kl(prob_a, prob_b) , but it gives: NotImplementedError: No KL(dist_a || dist_b) registered for dist_a type Tensor and dist_b type Tensor If there is no built-in function, what would be a good workaround? 回答1: Assuming that your input tensors prob_a and prob_b are probability

predict.lm() in a loop. warning: prediction from a rank-deficient fit may be misleading

阅读更多关于 predict.lm() in a loop. warning: prediction from a rank-deficient fit may be misleading

问题 This R code throws a warning # Fit regression model to each cluster y <- list() length(y) <- k vars <- list() length(vars) <- k f <- list() length(f) <- k for (i in 1:k) { vars[[i]] <- names(corc[[i]][corc[[i]]!= "1"]) f[[i]] <- as.formula(paste("Death ~", paste(vars[[i]], collapse= "+"))) y[[i]] <- lm(f[[i]], data=C1[[i]]) #training set C1[[i]] <- cbind(C1[[i]], fitted(y[[i]])) C2[[i]] <- cbind(C2[[i]], predict(y[[i]], C2[[i]])) #test set } I have a training data set (C1) and a test data set

Non-linear regression in C#

阅读更多关于 Non-linear regression in C#

问题 I'm looking for a way to produce a non-linear (preferably quadratic) curve, based on a 2D data set, for predictive purposes. Right now I'm using my own implementation of ordinary least squares (OLS) to produce a linear trend, but my trends are much more suited to a curve model. The data I'm analysing is system load over time. Here's the equation that I'm using to produce my linear coefficients: I've had a look at Math.NET Numerics and a few other libs, but they either provide interpolation

How to add RMSE, slope, intercept, r^2 to R plot?

阅读更多关于 How to add RMSE, slope, intercept, r^2 to R plot?

问题 How can I add RMSE, slope, intercept and r^2 to a plot using R? I have attached a script with sample data, which is a similar format to my real dataset--unfortunately, I am at a stand-still. Is there an easier way to add these statistics to the graph than to create an object from an equation and insert that into text() ? I would ideally like the statistics to be displayed stacked on the graph. How can I accomplish this? ## Generate Sample Data x = c(2,4,6,8,9,4,5,7,8,9,10) y = c(4,7,6,5,8,9,5

How to implement R's p.adjust in Python

阅读更多关于 How to implement R's p.adjust in Python

问题 I have a list of p-values and I would like to calculate the adjust p-values for multiple comparisons for the FDR. In R, I can use: pval <- read.csv("my_file.txt",header=F,sep="\t") pval <- pval[,1] FDR <- p.adjust(pval, method= "BH") print(length(pval[FDR<0.1])) write.table(cbind(pval, FDR),"pval_FDR.txt",row.names=F,sep="\t",quote=F ) How can I implement this code in Python? Here was my feable attempt in Python with the help of Google: pvalue_list [2.26717873145e-10, 1.36209234286e-11 , 0

Can scipy.stats identify and mask obvious outliers?

阅读更多关于 Can scipy.stats identify and mask obvious outliers?

问题 With scipy.stats.linregress I am performing a simple linear regression on some sets of highly correlated x,y experimental data, and initially visually inspecting each x,y scatter plot for outliers. More generally (i.e. programmatically) is there a way to identify and mask outliers? 回答1: The statsmodels package has what you need. Look at this little code snippet and its output: # Imports # import statsmodels.api as smapi import statsmodels.graphics as smgraphics # Make data # x = range(30) y =

Maximum Likelihood Estimate pseudocode

阅读更多关于 Maximum Likelihood Estimate pseudocode

问题 I need to code a Maximum Likelihood Estimator to estimate the mean and variance of some toy data. I have a vector with 100 samples, created with numpy.random.randn(100) . The data should have zero mean and unit variance Gaussian distribution. I checked Wikipedia and some extra sources, but I am a little bit confused since I don't have a statistics background. Is there any pseudo code for a maximum likelihood estimator? I get the intuition of MLE but I cannot figure out where to start coding.