statistics | 易学教程

c++ discrete distribution sampling with frequently changing probabilities

阅读更多关于 c++ discrete distribution sampling with frequently changing probabilities

问题 Problem: I need to sample from a discrete distribution constructed of certain weights e.g. {w1,w2,w3,..}, and thus probability distribution {p1,p2,p3,...}, where pi=wi/(w1+w2+...). some of wi's change very frequently, but only a very low proportion of all wi's. But the distribution itself thus has to be renormalised every time it happens, and therefore I believe Alias method does not work efficiently because one would need to build the whole distribution from scratch every time. The method I

R function that returns a string literal

阅读更多关于 R function that returns a string literal

问题 I have a vector: c(1,2,3) Calling print() on this value gives [1] 1 2 3 Is there a function that takes a vector and gives the string c(1,2,3) ? 回答1: You can use deparse : R> x <- c(1, 2, 3) R> deparse(x) [1] "c(1, 2, 3)" R> class(deparse(x)) [1] "character" 回答2: using dput : a <- c(1, 2, 3); dput(a) 回答3: I've never heard of such a function. Perhaps you should quickly write one yourself? toString <- function(a){ output <- "c("; for(i in 1:(length(a)-1)){ output <- paste(output, a[i], ",", sep=

Ruby: Using rand() in code but writing tests to verify probabilities

阅读更多关于 Ruby: Using rand() in code but writing tests to verify probabilities

问题 I have some code which delivers things based on weighted random. Things with more weight are more likely to be randomly chosen. Now being a good rubyist I of couse want to cover all this code with tests. And I want to test that things are getting fetched according the correct probabilities. So how do I test this? Creating tests for something that should be random make it very hard to compare actual vs expected. A few ideas I have, and why they wont work great: Stub Kernel.rand in my tests to

Ruby: Using rand() in code but writing tests to verify probabilities

阅读更多关于 Ruby: Using rand() in code but writing tests to verify probabilities

Detect significant changes in a data-set that gradually changes

阅读更多关于 Detect significant changes in a data-set that gradually changes

问题 I have a list of data in python that represents amount of resources used per minute. I want to find the number of times it changes significantly in that data set. What I mean by significant change is a bit different from what I've read so far. For e.g. if I have a dataset like [10,15,17,20,30,40,50,70,80,60,40,20] I say a significant change happens when data increases by double or reduces by half with respect to the previous normal. For e.g. since the list starts with 10, that is our starting

How to use different scaling approaches in weka

阅读更多关于 How to use different scaling approaches in weka

问题 I am using logistic regression with my data in weka. Now I want to try different scaling approaches to improve my results, such as min/max, zero mean/unit, variance, length etc. Is there any option in weka for using scaling? 回答1: Weka includes methods for data preprocessing: weka.filters.unsupervised.attribute.Normalize weka.filters.unsupervised.attribute.Standardize In Java: Instances train_data = ... Instances test_data = ... Standardize filter = new Standardize(); filter.setInputFormat

How to compute percentiles from frequency table?

阅读更多关于 How to compute percentiles from frequency table?

问题 I have CSV file: fr id 1 10000152 1 10000212 1 10000847 1 10001018 2 10001052 2 10001246 14 10001908 ........... This is a frequency table, where id is integer variable and fr is number of occurrences given value. File is sorted ascending by value. I would like to compute percentiles (ie. 90%, 80%, 70% ... 10%) of variable. I have done this in pure Python, similar to this pseudocode: bucket=sum(fr)/10.0 percentile=1 sum=0 for (current_fr, current_id) in zip(fr,id): sum=sum+current_fr if (sum

Changing outliers for NA in all columns in a dataset in R

阅读更多关于 Changing outliers for NA in all columns in a dataset in R

问题 I'm a beginner with R and can't manage to change outliers for ALL columns in a dataset in R. I succeeded changing one column at a time with dataset$column[dataset$column %in% boxplot.stats(dataset$column)$out] <- NA But I have 21 columns on which I need to change the outliers for NA. How would you do that? How would you do it for a column range? Specific columns? 回答1: You can use apply over the columns. Example: set.seed(1) x = matrix(rnorm(20), ncol = 2) x[2, 1] = 100 x[4, 2] = 200 apply(x,

BTYD (Buy 'Till You Die). Walkthrough. pnbd.EstimateParameters()

阅读更多关于 BTYD (Buy 'Till You Die). Walkthrough. pnbd.EstimateParameters()

问题 This is the first time I am working on BTYD procedure. I am having errors running the parameter estimates. I have provided the error message below. I have been following the BTYD - Walkthrough. Does anybody know how to fix this? I worked through the sample data set and it works fine. I uploaded my file in the same format, it wouldn't work. There are no missing or empty rows/values. Help would be greatly appreciated!!! end.of.cal.period <- as.Date("2013-08-18") elog.cal <- elog[which(elog$date

Matlab - Standard Deviation of Cartesian Points

阅读更多关于 Matlab - Standard Deviation of Cartesian Points

问题 I have an array of cartesian points (column 1 is x values and column 2 is y values) like so: 308 522 307 523 307 523 307 523 307 523 307 523 306 523 How would I go about getting a standard deviation of the points? It would be compared to the mean, which would be a straight line. The points are not that straight line, so then the standard deviation describes how wavy or "off-base" from the straight line the line segment is. I really appreciate the help. 回答1: If you are certain the xy data