statistics

c++ discrete distribution sampling with frequently changing probabilities

一笑奈何 提交于 2020-01-03 14:15:10
问题 Problem: I need to sample from a discrete distribution constructed of certain weights e.g. {w1,w2,w3,..}, and thus probability distribution {p1,p2,p3,...}, where pi=wi/(w1+w2+...). some of wi's change very frequently, but only a very low proportion of all wi's. But the distribution itself thus has to be renormalised every time it happens, and therefore I believe Alias method does not work efficiently because one would need to build the whole distribution from scratch every time. The method I

R function that returns a string literal

浪子不回头ぞ 提交于 2020-01-03 10:48:23
问题 I have a vector: c(1,2,3) Calling print() on this value gives [1] 1 2 3 Is there a function that takes a vector and gives the string c(1,2,3) ? 回答1: You can use deparse : R> x <- c(1, 2, 3) R> deparse(x) [1] "c(1, 2, 3)" R> class(deparse(x)) [1] "character" 回答2: using dput : a <- c(1, 2, 3); dput(a) 回答3: I've never heard of such a function. Perhaps you should quickly write one yourself? toString <- function(a){ output <- "c("; for(i in 1:(length(a)-1)){ output <- paste(output, a[i], ",", sep=

Ruby: Using rand() in code but writing tests to verify probabilities

天涯浪子 提交于 2020-01-03 08:50:14
问题 I have some code which delivers things based on weighted random. Things with more weight are more likely to be randomly chosen. Now being a good rubyist I of couse want to cover all this code with tests. And I want to test that things are getting fetched according the correct probabilities. So how do I test this? Creating tests for something that should be random make it very hard to compare actual vs expected. A few ideas I have, and why they wont work great: Stub Kernel.rand in my tests to

Ruby: Using rand() in code but writing tests to verify probabilities

余生长醉 提交于 2020-01-03 08:50:03
问题 I have some code which delivers things based on weighted random. Things with more weight are more likely to be randomly chosen. Now being a good rubyist I of couse want to cover all this code with tests. And I want to test that things are getting fetched according the correct probabilities. So how do I test this? Creating tests for something that should be random make it very hard to compare actual vs expected. A few ideas I have, and why they wont work great: Stub Kernel.rand in my tests to

Detect significant changes in a data-set that gradually changes

谁说胖子不能爱 提交于 2020-01-03 05:13:23
问题 I have a list of data in python that represents amount of resources used per minute. I want to find the number of times it changes significantly in that data set. What I mean by significant change is a bit different from what I've read so far. For e.g. if I have a dataset like [10,15,17,20,30,40,50,70,80,60,40,20] I say a significant change happens when data increases by double or reduces by half with respect to the previous normal. For e.g. since the list starts with 10, that is our starting

How to use different scaling approaches in weka

陌路散爱 提交于 2020-01-03 04:54:11
问题 I am using logistic regression with my data in weka. Now I want to try different scaling approaches to improve my results, such as min/max, zero mean/unit, variance, length etc. Is there any option in weka for using scaling? 回答1: Weka includes methods for data preprocessing: weka.filters.unsupervised.attribute.Normalize weka.filters.unsupervised.attribute.Standardize In Java: Instances train_data = ... Instances test_data = ... Standardize filter = new Standardize(); filter.setInputFormat

How to compute percentiles from frequency table?

南楼画角 提交于 2020-01-03 04:52:04
问题 I have CSV file: fr id 1 10000152 1 10000212 1 10000847 1 10001018 2 10001052 2 10001246 14 10001908 ........... This is a frequency table, where id is integer variable and fr is number of occurrences given value. File is sorted ascending by value. I would like to compute percentiles (ie. 90%, 80%, 70% ... 10%) of variable. I have done this in pure Python, similar to this pseudocode: bucket=sum(fr)/10.0 percentile=1 sum=0 for (current_fr, current_id) in zip(fr,id): sum=sum+current_fr if (sum

Changing outliers for NA in all columns in a dataset in R

风流意气都作罢 提交于 2020-01-03 03:17:08
问题 I'm a beginner with R and can't manage to change outliers for ALL columns in a dataset in R. I succeeded changing one column at a time with dataset$column[dataset$column %in% boxplot.stats(dataset$column)$out] <- NA But I have 21 columns on which I need to change the outliers for NA. How would you do that? How would you do it for a column range? Specific columns? 回答1: You can use apply over the columns. Example: set.seed(1) x = matrix(rnorm(20), ncol = 2) x[2, 1] = 100 x[4, 2] = 200 apply(x,

BTYD (Buy 'Till You Die). Walkthrough. pnbd.EstimateParameters()

…衆ロ難τιáo~ 提交于 2020-01-03 01:56:08
问题 This is the first time I am working on BTYD procedure. I am having errors running the parameter estimates. I have provided the error message below. I have been following the BTYD - Walkthrough. Does anybody know how to fix this? I worked through the sample data set and it works fine. I uploaded my file in the same format, it wouldn't work. There are no missing or empty rows/values. Help would be greatly appreciated!!! end.of.cal.period <- as.Date("2013-08-18") elog.cal <- elog[which(elog$date

Matlab - Standard Deviation of Cartesian Points

独自空忆成欢 提交于 2020-01-02 23:47:03
问题 I have an array of cartesian points (column 1 is x values and column 2 is y values) like so: 308 522 307 523 307 523 307 523 307 523 307 523 306 523 How would I go about getting a standard deviation of the points? It would be compared to the mean, which would be a straight line. The points are not that straight line, so then the standard deviation describes how wavy or "off-base" from the straight line the line segment is. I really appreciate the help. 回答1: If you are certain the xy data