statistics

How can I predict memory usage and time based on historical values

我是研究僧i 提交于 2019-12-31 04:04:07
问题 A maths problem really I think... I have some historical data for some spreadsheet outputs along with the number of rows and columns. What I'd like to do is use this data to predict the peak memory usage and time taken based on the - known - row and columns. So, if no historical data exists then there will be no predictions. 1 or 2 historical values will be very inaccurate but I hope that given a wide enough variety of historical values, then a reasonably-accurate prediction could be made? I

Exporting ggvis plot with grouped data

爷,独闯天下 提交于 2019-12-31 03:48:05
问题 I've experienced a problem when exporting a ggvis plot (by using vg2png). A simple case works nicely: library(ggvis) mtcars %>% ggvis(x = ~hp, y = ~mpg) %>% export_png() However, if I want to export grouped data, I get the following error: mtcars %>% ggvis(x = ~hp, y = ~mpg) %>% group_by(cyl) %>% export_png() /usr/local/lib/node_modules/vega/vega.js:4799 var tx = vg.data[def.type](); ^ TypeError: Property 'treefacet' of object #<Object> is not a function at vg.parse.transform (/usr/local/lib

Missing values in MS Excel LINEST, TREND, LOGEST and GROWTH functions

删除回忆录丶 提交于 2019-12-31 01:47:16
问题 I'm using the GROWTH (or LINEST or TREND or LOGEST, all make the same trouble) function in Excel 2003. But there is a problem that if some data is missing, the function refuses to give result: You can download the file here. Is there any workaround? Looking for easy and elegant solution. I don't want the obvious workaround of getting rid of the missing value - that would mean to delete the column and that would also damage the graph, and it would make problems in my other tables where I have

How to calculate sample and population variances in Matlab?

偶尔善良 提交于 2019-12-30 18:29:28
问题 I have a vector a a = [86 100 41 93 75 61 76 92 88 97] And I want to calculate the std and mean by myself: >> mean(a) ans = 80.9000 >> std(a)^2 ans = 335.2111 But when I do it like that I get wrong variance: >> avg = mean(a) avg = 80.9000 >> var = sum(a.^2)/length(a) - avg^2 var = 301.6900 What do I miss here ? why sum(a.^2)/length(a) - avg^2 != std(a)^2 ? 回答1: Try this: var = sum(a.^2)/(length(a)-1) - (length(a))*mean(a)^2/(length(a)-1) var = 335.2111 var is computed as (unbiased) sample,

How to calculate sample and population variances in Matlab?

好久不见. 提交于 2019-12-30 18:29:08
问题 I have a vector a a = [86 100 41 93 75 61 76 92 88 97] And I want to calculate the std and mean by myself: >> mean(a) ans = 80.9000 >> std(a)^2 ans = 335.2111 But when I do it like that I get wrong variance: >> avg = mean(a) avg = 80.9000 >> var = sum(a.^2)/length(a) - avg^2 var = 301.6900 What do I miss here ? why sum(a.^2)/length(a) - avg^2 != std(a)^2 ? 回答1: Try this: var = sum(a.^2)/(length(a)-1) - (length(a))*mean(a)^2/(length(a)-1) var = 335.2111 var is computed as (unbiased) sample,

Analytics+Statistics for offline Apps?

ぃ、小莉子 提交于 2019-12-30 11:22:34
问题 As the title suggests, what I need to do is track various user events - such as clicks, swipes, time spent on a page, etc. - in an various iOs/Android/Windows App. These Apps are based on responsive HTML/CSS/JS and have a simple OS specific container. All data such as images, videos, etc. are self-contained in the Apps. Characteristic for these Apps is, that sales staff will use iPads/Surface/Android Tablets to demonstrate features of products, spreadsheets, infomercials, etc. to possible

Why did PCA reduced the performance of Logistic Regression?

牧云@^-^@ 提交于 2019-12-30 07:18:08
问题 I performed Logistic regression on a binary classification problem with data of 50000 X 370 dimensions.I got accuracy of about 90%.But when i did PCA + logistic on data, my accuracy reduced to 10%, I was very shocked to see this result. Can anybody explain what could have gone wrong? 回答1: There is no guarantee that PCA will ever help, or not harm the learning process. In particular - if you use PCA to reduce amount of dimensions - you are removing information from your data, thus everything

2x4 Lattice Barchart minimally in R?

陌路散爱 提交于 2019-12-30 07:11:27
问题 Two data files of two different measurement sessions: ECG and B ECG . Each data file contains male and female. I want to do 2 column x 4 row Lattice Barchart minimally in R where the following is a draft of the interface. I can do 2x2 barchart, see code below. There must be some more minimal way than manually just adding more and more lines to the end of the code, which is difficult to control. ECG B.ECG female female Sinus Arr/AHB Digoxin arr Furosemide arr ECG B.ECG male male Sinus Arr/AHB

How can I loop through variables in SPSS? I want to avoid code duplication

余生颓废 提交于 2019-12-30 05:11:08
问题 Is there a "native" SPSS way to loop through some variable names? All I want to do is take a list of variables (that I define) and run the same procedure for them: pseudo-code - not really a good example, but gets the point across... for i in varlist['a','b','c'] do FREQUENCIES VARIABLES=varlist[i] / ORDER=ANALYSIS. end I've noticed that people seem to just use R or Python SPSS plugins to achieve this basic array functionality, but I don't know how soon I can get those configured (if ever) on

python scipy.stats.powerlaw negative exponent

空扰寡人 提交于 2019-12-30 02:28:07
问题 I want to supply a negative exponent for the scipy.stats.powerlaw routine, e.g. a=-1.5, in order to draw random samples: """ powerlaw.pdf(x, a) = a * x**(a-1) """ from scipy.stats import powerlaw R = powerlaw.rvs(a, size=100) Why is a > 0 required, how can I supply a negative a in order to generate the random samples, and how can I supply a normalization coefficient/transform, i.e. PDF(x,C,a) = C * x**a The documentation is here http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats