statistics

Weighted Mean

╄→гoц情女王★ 提交于 2019-12-21 22:49:16
问题 I have an existing web app that allows users to "rate" items based on their difficulty. (0 through 15). Currently, I'm simply taking the average of each user's opinion and presenting the average straight from MySQL. However, it's becoming clear to me (and my users) that weighting the numbers would be more appropriate. Oddly enough, a few hours of Google-ing hasn't turned up much. I did find two articles that showed site-wide ratings systems based off of "Bayesian filters" (which I partially

Does an expected value command exist in R

前提是你 提交于 2019-12-21 21:19:53
问题 I'm sure there must be a straight forward command for this, but I've searched and can't find one. How do I get the expected value from a vector? Here are the values y <- c(0.05, 0.01, -0.1) And their probabilities p <- c(0.2, 0.7, 0.1) I can get E(Y) by doing sum(y*p) But I think there is probably a command for it right, I just can't find it. Thanks! 回答1: You can use weighted.mean : weighted.mean(y, p) # [1] 0.007 回答2: Here's another option: > c(y %*% p) [1] 0.007 来源: https://stackoverflow

Javascript equivalent for Inverse normal function ? eg Excel's NORMSINV() or NORMINV()?

China☆狼群 提交于 2019-12-21 21:08:59
问题 I'm trying to convert something from my excel spreadsheets into Javascript and came along the NORMSINV() macro in my spreadsheets. The NormSInv() is nicely documented at http://office.microsoft.com/en-us/excel-help/normsinv-HP005209195.aspx. Basically it's of the form Z = NormSInv(probability) where if you give it the probability (say 0.90), it gives you the Z value for a standard normal distribution (Z= 1.33). I could encode the entire transformation table as per http://en.wikipedia.org/wiki

How can I compute the probability at a point given a normal distribution in Perl?

人走茶凉 提交于 2019-12-21 20:33:43
问题 Is there a package in Perl that allows you to compute the height of probability distribution at each given point. For example this can be done in R this way: > dnorm(0, mean=4,sd=10) > 0.03682701 Namely the probability of point x=0 falls into a normal distribution, with mean=4 and sd=10, is 0.0368. I looked at Statistics::Distribution but it doesn't give that very function to do it. 回答1: Why not something along these lines (I am writing in R, but it could be done in perl with Statistics:

How to plot a histogram with a custom distribution?

戏子无情 提交于 2019-12-21 20:25:24
问题 In an old statistics textbook, I found a table of a distribution of ages for a country's population: Percent of Age population ------------------ 0-5 8 5-14 18 14-18 8 18-21 5 21-25 6 25-35 12 35-45 11 45-55 11 55-65 9 65-75 6 75-85 4 I wanted to plot this distribution as a histogram in R, with the age ranges as breaks and the percent of population as the density, but there didn't seem to be a straightforward way to do it. R's hist() function wants you to supply the individual data points,

Logistic regression returns error but runs okay on reduced dataset

浪尽此生 提交于 2019-12-21 20:17:12
问题 I would appreciate your input on this a lot! I am working on a logistic regression, but it is not working for some reason: mod1<-glm(survive~reLDM2+yr+yr2+reLDM2:yr +reLDM2:yr2+NestAge0, family=binomial(link=logexp(NSSH1$exposure)), data=NSSH1, control = list(maxit = 50)) When I run the same model with less data it works! But with the complete dataset I get an error and warning messages: Error: inner loop 1; cannot correct step size In addition: Warning messages: 1: step size truncated due to

Unexpected standard errors with weighted least squares in Python Pandas

假如想象 提交于 2019-12-21 19:47:18
问题 In the code for the main OLS class in Python Pandas, I am looking for help to clarify what conventions are used for the standard error and t-stats reported when weighted OLS is performed. Here's my example data set, with some imports to use Pandas and to use scikits.statsmodels WLS directly: import pandas import numpy as np from statsmodels.regression.linear_model import WLS # Make some random data. np.random.seed(42) df = pd.DataFrame(np.random.randn(10, 3), columns=['a', 'b', 'weights']) #

Unexpected standard errors with weighted least squares in Python Pandas

淺唱寂寞╮ 提交于 2019-12-21 19:47:05
问题 In the code for the main OLS class in Python Pandas, I am looking for help to clarify what conventions are used for the standard error and t-stats reported when weighted OLS is performed. Here's my example data set, with some imports to use Pandas and to use scikits.statsmodels WLS directly: import pandas import numpy as np from statsmodels.regression.linear_model import WLS # Make some random data. np.random.seed(42) df = pd.DataFrame(np.random.randn(10, 3), columns=['a', 'b', 'weights']) #

Advice on calculating a function to describe upper bound of data

痞子三分冷 提交于 2019-12-21 19:12:16
问题 I have a scatter plot of a dataset and I am interested in calculating the upper bound of the data. I don't know if this is a standard statistical approach so what I was considering doing was splitting the X-axis data into small ranges, calculating the max for these ranges and then trying to identify a function to describe these points. Is there a function already in R to do this? If it's relevant there are 92611 points. 回答1: You might like to look into quantile regression, which is available

Advice on calculating a function to describe upper bound of data

若如初见. 提交于 2019-12-21 19:11:25
问题 I have a scatter plot of a dataset and I am interested in calculating the upper bound of the data. I don't know if this is a standard statistical approach so what I was considering doing was splitting the X-axis data into small ranges, calculating the max for these ranges and then trying to identify a function to describe these points. Is there a function already in R to do this? If it's relevant there are 92611 points. 回答1: You might like to look into quantile regression, which is available