statistics | 易学教程

How can I predict memory usage and time based on historical values

阅读更多关于 How can I predict memory usage and time based on historical values

问题 A maths problem really I think... I have some historical data for some spreadsheet outputs along with the number of rows and columns. What I'd like to do is use this data to predict the peak memory usage and time taken based on the - known - row and columns. So, if no historical data exists then there will be no predictions. 1 or 2 historical values will be very inaccurate but I hope that given a wide enough variety of historical values, then a reasonably-accurate prediction could be made? I

Exporting ggvis plot with grouped data

阅读更多关于 Exporting ggvis plot with grouped data

问题 I've experienced a problem when exporting a ggvis plot (by using vg2png). A simple case works nicely: library(ggvis) mtcars %>% ggvis(x = ~hp, y = ~mpg) %>% export_png() However, if I want to export grouped data, I get the following error: mtcars %>% ggvis(x = ~hp, y = ~mpg) %>% group_by(cyl) %>% export_png() /usr/local/lib/node_modules/vega/vega.js:4799 var tx = vg.data[def.type](); ^ TypeError: Property 'treefacet' of object #<Object> is not a function at vg.parse.transform (/usr/local/lib

Missing values in MS Excel LINEST, TREND, LOGEST and GROWTH functions

阅读更多关于 Missing values in MS Excel LINEST, TREND, LOGEST and GROWTH functions

问题 I'm using the GROWTH (or LINEST or TREND or LOGEST, all make the same trouble) function in Excel 2003. But there is a problem that if some data is missing, the function refuses to give result: You can download the file here. Is there any workaround? Looking for easy and elegant solution. I don't want the obvious workaround of getting rid of the missing value - that would mean to delete the column and that would also damage the graph, and it would make problems in my other tables where I have

How to calculate sample and population variances in Matlab?

阅读更多关于 How to calculate sample and population variances in Matlab?

问题 I have a vector a a = [86 100 41 93 75 61 76 92 88 97] And I want to calculate the std and mean by myself: >> mean(a) ans = 80.9000 >> std(a)^2 ans = 335.2111 But when I do it like that I get wrong variance: >> avg = mean(a) avg = 80.9000 >> var = sum(a.^2)/length(a) - avg^2 var = 301.6900 What do I miss here ? why sum(a.^2)/length(a) - avg^2 != std(a)^2 ? 回答1: Try this: var = sum(a.^2)/(length(a)-1) - (length(a))*mean(a)^2/(length(a)-1) var = 335.2111 var is computed as (unbiased) sample,

How to calculate sample and population variances in Matlab?

阅读更多关于 How to calculate sample and population variances in Matlab?

Analytics+Statistics for offline Apps?

阅读更多关于 Analytics+Statistics for offline Apps?

问题 As the title suggests, what I need to do is track various user events - such as clicks, swipes, time spent on a page, etc. - in an various iOs/Android/Windows App. These Apps are based on responsive HTML/CSS/JS and have a simple OS specific container. All data such as images, videos, etc. are self-contained in the Apps. Characteristic for these Apps is, that sales staff will use iPads/Surface/Android Tablets to demonstrate features of products, spreadsheets, infomercials, etc. to possible

Why did PCA reduced the performance of Logistic Regression?

阅读更多关于 Why did PCA reduced the performance of Logistic Regression?

问题 I performed Logistic regression on a binary classification problem with data of 50000 X 370 dimensions.I got accuracy of about 90%.But when i did PCA + logistic on data, my accuracy reduced to 10%, I was very shocked to see this result. Can anybody explain what could have gone wrong? 回答1: There is no guarantee that PCA will ever help, or not harm the learning process. In particular - if you use PCA to reduce amount of dimensions - you are removing information from your data, thus everything

2x4 Lattice Barchart minimally in R?

阅读更多关于 2x4 Lattice Barchart minimally in R?

问题 Two data files of two different measurement sessions: ECG and B ECG . Each data file contains male and female. I want to do 2 column x 4 row Lattice Barchart minimally in R where the following is a draft of the interface. I can do 2x2 barchart, see code below. There must be some more minimal way than manually just adding more and more lines to the end of the code, which is difficult to control. ECG B.ECG female female Sinus Arr/AHB Digoxin arr Furosemide arr ECG B.ECG male male Sinus Arr/AHB

How can I loop through variables in SPSS? I want to avoid code duplication

阅读更多关于 How can I loop through variables in SPSS? I want to avoid code duplication

问题 Is there a "native" SPSS way to loop through some variable names? All I want to do is take a list of variables (that I define) and run the same procedure for them: pseudo-code - not really a good example, but gets the point across... for i in varlist['a','b','c'] do FREQUENCIES VARIABLES=varlist[i] / ORDER=ANALYSIS. end I've noticed that people seem to just use R or Python SPSS plugins to achieve this basic array functionality, but I don't know how soon I can get those configured (if ever) on

python scipy.stats.powerlaw negative exponent

阅读更多关于 python scipy.stats.powerlaw negative exponent

问题 I want to supply a negative exponent for the scipy.stats.powerlaw routine, e.g. a=-1.5, in order to draw random samples: """ powerlaw.pdf(x, a) = a * x**(a-1) """ from scipy.stats import powerlaw R = powerlaw.rvs(a, size=100) Why is a > 0 required, how can I supply a negative a in order to generate the random samples, and how can I supply a normalization coefficient/transform, i.e. PDF(x,C,a) = C * x**a The documentation is here http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats