statistics | 易学教程

Regression of a Data Frame with multiple factor groupings

阅读更多关于 Regression of a Data Frame with multiple factor groupings

问题 I am working on a regression script. I have a data.frame with roughly 130 columns, of which I need to do a regression for one column (lets call it X column) against all the other ~100 numeric columns. Before the regression is calculated, I need to group the data by 4 factors: myDat$Recipe , myDat$Step , myDat$Stage , and myDat$Prod while still keeping the other ~100 columns and row data attached for the regression. Then I need to do a regression of each column ~ X column and print out the R^2

Generation of uniformly distributed random noise

阅读更多关于 Generation of uniformly distributed random noise

问题 I've been working on generating Perlin noise for a map generator of mine. The problem I've run into is that the random noise is not distributed normally, and is more likely a normal distribution of kinds. Given two integers X and Y, and a seed value, I do the following: Use MurmurHash2 to generate a random number (-1,1). This is uniformly distributed. Interpolate points between integer values with cubic interpolation. Values now fall in the range (-2.25, 2.25) because the interpolation can

Occurrence prediction

阅读更多关于 Occurrence prediction

问题 I'd like to know what method is best suited for predicting event occurrences. For example, given a set of data from 5 years of malaria infection occurrences and several other factors that affect the occurrences, I'd like to predict the next five years for malaria infection occurrences. What I thought of doing was to derive a kind of occurrence factor using fuzzy logic rules, and then average the occurrences with the occurrence factor to get the first predicted occurrence, and then average all

Log likelihood function for GDA(Gaussian Discriminative analysis)

阅读更多关于 Log likelihood function for GDA(Gaussian Discriminative analysis)

问题 I am having trouble understanding the likelihood function for GDA given in Andrew Ng's CS229 notes. l(φ,µ0,µ1,Σ) = log (product from i to m) {p(x(i)|y(i);µ0,µ1,Σ)p(y(i);φ)} The link is http://cs229.stanford.edu/notes/cs229-notes2.pdf Page 5. For Linear regression the function was product from i to m p(y(i)|x(i);theta) which made sense to me. Why is there a change here saying it is given by p(x(i)|y(i) and that is multiplied by p(y(i);phi)? Thanks in advance 回答1: The starting formula on page 5

R won't reference/can't find a compiled loaded C Code

阅读更多关于 R won't reference/can't find a compiled loaded C Code

问题 I've created a new Robust HoltWinters function (based on the stats::Holt-Winters) method in R (per "Robust Forecasting with Exponential and Holt-Winters Smoothing" by Sarah Gelper1,, Roland Fried, Christophe Croux. September 26, 2008.) Why? Well...why not! But I digress... The core of the stats::Holt-Winters method is a C code called C_HoltWinters, which I've modified to be robust (See below) #include <stdlib.h> #include <string.h> // memcpy #include <math.h> #include <R.h> #include "ts.h"

Generalized additive models for calibration

阅读更多关于 Generalized additive models for calibration

问题 I work on calibration of probabilities. I'm using a probability mapping approach called generalized additive models. The algorithm I wrote is: probMapping = function(x, y, datax, datay) { if(length(x) < length(y))stop("train smaller than test") if(length(datax) < length(datay))stop("train smaller than test") datax$prob = x # trainset: data and raw probabilities datay$prob = y # testset: data and raw probabilities prob_map = gam(Target ~ prob, data = datax, familiy = binomial, trace = TRUE)

Aggregating 10 minute data to hourly mean with the hourly.apply function fails

阅读更多关于 Aggregating 10 minute data to hourly mean with the hourly.apply function fails

问题 I have a file with date/time data and its measured values for said date and time. The values were measured every ten minutes for the course of one month, and I am attempting to do a time series analysis eventually. Before that however, I wanted to aggregate the 10 minute intervals to hourly intervals by calculating the mean measurement of every 60 minutes. Here is a sample of my data(a total of 4319 observations): Date/Time Value 2013-01-01 00:00:00 31,439999 2013-01-01 00:10:00 33,439999

Error in vcov.default(mod) : there is no vcov() method for models of class list (changing from type I to type III Sum of Squares)

阅读更多关于 Error in vcov.default(mod) : there is no vcov() method for models of class list (changing from type I to type III Sum of Squares)

问题 I am trying to get an ANOVA table for my split-split plot design where it will use type III Sum sq instead of type I. This is what I have done so far; > Attach(Data) > library(car) > options(contrasts = c("contr.sum", "contr.poly")) > mod <- aov(Response ~ A*B*C + Error(Block/A/B/C)) > Anova(mod, type='III') Error in vcov.default(mod) : there is no vcov() method for models of class aovlist, listof I don't understand why I keep getting this error message, or what to do about it. Any help

Unexpected behavior in scipy isf

阅读更多关于 Unexpected behavior in scipy isf

问题 I am using scipy's stats module to try and determine values of a distribution at which the upper tail probability reaches some small value, but I am getting some very unrealistic results. For example: I fit a beta distribution to an array of the square of normalized correlation coefficients for a signal matching operation (correlation coefficient is always between -1 and 1 so its square is between 0 and 1). Using import scipy, numpy as np bd=scipy.beta.fit(np.square(data),floc=0,fscale=1)

R squared and adjusted R squared with one predictor

阅读更多关于 R squared and adjusted R squared with one predictor

问题 Using the following to estimate the coefficient of determination in MATLAB: load hospital y = hospital.BloodPressure(:,1); X = double(hospital(:,2:5)); X2 = X(:,3); mdl = fitlm(X2,y); Estimated Coefficients: Estimate SE tStat pValue ________ ________ ______ __________ (Intercept) 116.72 3.9389 29.633 1.0298e-50 x1 0.039357 0.025208 1.5613 0.12168 Number of observations: 100, Error degrees of freedom: 98 Root Mean Squared Error: 6.66 R-squared: 0.0243, Adjusted R-Squared 0.0143 F-statistic vs.