statistics

Regression of a Data Frame with multiple factor groupings

烂漫一生 提交于 2019-12-25 05:32:00
问题 I am working on a regression script. I have a data.frame with roughly 130 columns, of which I need to do a regression for one column (lets call it X column) against all the other ~100 numeric columns. Before the regression is calculated, I need to group the data by 4 factors: myDat$Recipe , myDat$Step , myDat$Stage , and myDat$Prod while still keeping the other ~100 columns and row data attached for the regression. Then I need to do a regression of each column ~ X column and print out the R^2

Generation of uniformly distributed random noise

青春壹個敷衍的年華 提交于 2019-12-25 04:44:22
问题 I've been working on generating Perlin noise for a map generator of mine. The problem I've run into is that the random noise is not distributed normally, and is more likely a normal distribution of kinds. Given two integers X and Y, and a seed value, I do the following: Use MurmurHash2 to generate a random number (-1,1). This is uniformly distributed. Interpolate points between integer values with cubic interpolation. Values now fall in the range (-2.25, 2.25) because the interpolation can

Occurrence prediction

自古美人都是妖i 提交于 2019-12-25 04:32:32
问题 I'd like to know what method is best suited for predicting event occurrences. For example, given a set of data from 5 years of malaria infection occurrences and several other factors that affect the occurrences, I'd like to predict the next five years for malaria infection occurrences. What I thought of doing was to derive a kind of occurrence factor using fuzzy logic rules, and then average the occurrences with the occurrence factor to get the first predicted occurrence, and then average all

Log likelihood function for GDA(Gaussian Discriminative analysis)

浪子不回头ぞ 提交于 2019-12-25 04:29:09
问题 I am having trouble understanding the likelihood function for GDA given in Andrew Ng's CS229 notes. l(φ,µ0,µ1,Σ) = log (product from i to m) {p(x(i)|y(i);µ0,µ1,Σ)p(y(i);φ)} The link is http://cs229.stanford.edu/notes/cs229-notes2.pdf Page 5. For Linear regression the function was product from i to m p(y(i)|x(i);theta) which made sense to me. Why is there a change here saying it is given by p(x(i)|y(i) and that is multiplied by p(y(i);phi)? Thanks in advance 回答1: The starting formula on page 5

R won't reference/can't find a compiled loaded C Code

▼魔方 西西 提交于 2019-12-25 04:18:11
问题 I've created a new Robust HoltWinters function (based on the stats::Holt-Winters) method in R (per "Robust Forecasting with Exponential and Holt-Winters Smoothing" by Sarah Gelper1,, Roland Fried, Christophe Croux. September 26, 2008.) Why? Well...why not! But I digress... The core of the stats::Holt-Winters method is a C code called C_HoltWinters, which I've modified to be robust (See below) #include <stdlib.h> #include <string.h> // memcpy #include <math.h> #include <R.h> #include "ts.h"

Generalized additive models for calibration

六眼飞鱼酱① 提交于 2019-12-25 03:41:23
问题 I work on calibration of probabilities. I'm using a probability mapping approach called generalized additive models. The algorithm I wrote is: probMapping = function(x, y, datax, datay) { if(length(x) < length(y))stop("train smaller than test") if(length(datax) < length(datay))stop("train smaller than test") datax$prob = x # trainset: data and raw probabilities datay$prob = y # testset: data and raw probabilities prob_map = gam(Target ~ prob, data = datax, familiy = binomial, trace = TRUE)

Aggregating 10 minute data to hourly mean with the hourly.apply function fails

不羁岁月 提交于 2019-12-25 02:53:39
问题 I have a file with date/time data and its measured values for said date and time. The values were measured every ten minutes for the course of one month, and I am attempting to do a time series analysis eventually. Before that however, I wanted to aggregate the 10 minute intervals to hourly intervals by calculating the mean measurement of every 60 minutes. Here is a sample of my data(a total of 4319 observations): Date/Time Value 2013-01-01 00:00:00 31,439999 2013-01-01 00:10:00 33,439999

Error in vcov.default(mod) : there is no vcov() method for models of class list (changing from type I to type III Sum of Squares)

◇◆丶佛笑我妖孽 提交于 2019-12-24 19:16:47
问题 I am trying to get an ANOVA table for my split-split plot design where it will use type III Sum sq instead of type I. This is what I have done so far; > Attach(Data) > library(car) > options(contrasts = c("contr.sum", "contr.poly")) > mod <- aov(Response ~ A*B*C + Error(Block/A/B/C)) > Anova(mod, type='III') Error in vcov.default(mod) : there is no vcov() method for models of class aovlist, listof I don't understand why I keep getting this error message, or what to do about it. Any help

Unexpected behavior in scipy isf

心不动则不痛 提交于 2019-12-24 16:43:17
问题 I am using scipy's stats module to try and determine values of a distribution at which the upper tail probability reaches some small value, but I am getting some very unrealistic results. For example: I fit a beta distribution to an array of the square of normalized correlation coefficients for a signal matching operation (correlation coefficient is always between -1 and 1 so its square is between 0 and 1). Using import scipy, numpy as np bd=scipy.beta.fit(np.square(data),floc=0,fscale=1)

R squared and adjusted R squared with one predictor

梦想与她 提交于 2019-12-24 15:43:02
问题 Using the following to estimate the coefficient of determination in MATLAB: load hospital y = hospital.BloodPressure(:,1); X = double(hospital(:,2:5)); X2 = X(:,3); mdl = fitlm(X2,y); Estimated Coefficients: Estimate SE tStat pValue ________ ________ ______ __________ (Intercept) 116.72 3.9389 29.633 1.0298e-50 x1 0.039357 0.025208 1.5613 0.12168 Number of observations: 100, Error degrees of freedom: 98 Root Mean Squared Error: 6.66 R-squared: 0.0243, Adjusted R-Squared 0.0143 F-statistic vs.