statistics | 易学教程

Workflow for statistical analysis and report writing

阅读更多关于 Workflow for statistical analysis and report writing

问题 Does anyone have any wisdom on workflows for data analysis related to custom report writing? The use-case is basically this: Client commissions a report that uses data analysis, e.g. a population estimate and related maps for a water district. The analyst downloads some data, munges the data and saves the result (e.g. adding a column for population per unit, or subsetting the data based on district boundaries). The analyst analyzes the data created in (2), gets close to her goal, but sees

R bar chart colours for groups of bars

阅读更多关于 R bar chart colours for groups of bars

问题 Fairly new to R so sorry if this is a dumb question. I want to plot a bar chart of a lot of data - maybe 100 bars. I want to use colours and spacing to highlight the "groups", so I might have the first 10 bars in blue, a small gap, the next 20 in red, a small gap and so on. I can plot the data fine, but how can I do the colouring and gaps in this way? 回答1: This can be done quite easily with ggplot2 as provided in links by @Arun. With base graphics to set space between bars you can use

Convert character vector to numeric vector in R for value assignment?

阅读更多关于 Convert character vector to numeric vector in R for value assignment?

问题 I have: z = data.frame(x1=a, x2=b, x3=c, etc) I am trying to do: for (i in 1:10) { paste(c('N'),i,sep="") -> paste(c('z$x'),i,sep="") } Problems: paste(c('z$x'),i,sep="") yields "z$x1", "z$x1" instead of calling the actual values. I need the expression to be evaluated. I tried as.numeric, eval . Neither seemed to work. paste(c('N'),i,sep="") yields "N1", "N2" . I need the expression to be merely used as name. If I try to assign it a value such as paste(c('N'),5,sep="") -> 5 , ie "N5" -> 5

Gaussian Mixture Model in MATLAB - Calculation of the Empirical Variance Covariance Matrix

阅读更多关于 Gaussian Mixture Model in MATLAB - Calculation of the Empirical Variance Covariance Matrix

问题 I am having issues in reconciling some basic theoretical results on Gaussian mixtures and the output of the commands gmdistribution, random in Matlab. Consider a mixture of two independent 3-variate normal distributions with weights 1/2,1/2 . The first distribution A is characterised by mean and variance-covariance matrix equal to muA=[-1.4 3.2 -1.9]; %mean vector rhoA=-0.5; %correlation among components in A sigmaA=[1 rhoA rhoA; rhoA 1 rhoA; rhoA rhoA 1]; %variance-covariance matrix of A The

Python - Statistical distribution

阅读更多关于 Python - Statistical distribution

问题 I'm quite new to python world. Also, I'm not a statistician. I'm in the need to implementing mathematical models developed by mathematicians in a computer science programming language. I've chosen python after some research. I'm comfortable with programming as such (PHP/HTML/javascript). I have a column of values that I've extracted from a MySQL database & in need to calculate the below - 1) Normal distribution of it. (I don't have the sigma & mu values. These need to be calculated too

How to use conditional statement and return value for a function in R?

阅读更多关于 How to use conditional statement and return value for a function in R?

问题 I have to create a function as: ans(x) which returns the value 2*abs(x), if x is negative, and the value x otherwise. What command could i use? Thanks 回答1: ans <- function(x){ ifelse(x < 0, 2*abs(x), x) } will do. > ans(2) [1] 2 > ans(-2) [1] 4 Explanation: We can use the built-in base R function ifelse() . The logic is pretty simple: ifelse(condition, output if condition is TRUE, output if condition is FALSE) Therefore, ifelse(x < 0, 2*abs(x), x) will do the following: evaluate whether value

Entropy and Information Gain

阅读更多关于 Entropy and Information Gain

问题 Simple question I hope. If I have a set of data like this: Classification attribute-1 attribute-2 Correct dog dog Correct dog dog Wrong dog cat Correct cat cat Wrong cat dog Wrong cat dog Then what is the information gain of attribute-2 relative to attribute-1? I've computed the entropy of the whole data set: -(3/6)log2(3/6)-(3/6)log2(3/6)=1 Then I'm stuck! I think you need to calculate entropies of attribute-1 and attribute-2 too? Then use these three calculations in an information gain

Testing Skewness in Time Series data using R but getting “Error: NCOL(x) == 1 is not TRUE”

阅读更多关于 Testing Skewness in Time Series data using R but getting “Error: NCOL(x) == 1 is not TRUE”

问题 I am using the Dow Jones Dataset and I am trying to test skewness. So far this is the code: library(tseries) library(zoo) library(reshape2) library(fBasics) dow = read.table('dow_jones_index.data', header=T, sep=',') # create time series dow <- read.table('dow_jones_index.data', header=T, sep=',', stringsAsFactors = FALSE) # delete $ symbol and coerce to numeric dow$close <- as.numeric(sub("\\$", "",dow$close)) tmp <- dcast(dow, date~stock, value.var = "close") #tmp[,-1] means it's removing

Fitting data to a probability distribution, maybe skew normal?

阅读更多关于 Fitting data to a probability distribution, maybe skew normal?

问题 I am trying to fit my data to some kind of a probability distribution, so I can then generate random numbers based on the distribution. Below is what the data points look like, with x-axis behind the data values and y-axis the probabilities. Data plot They look like they would fit to a skew normal distribution, with mean around 10^-4. The plot's data is actually binned from an original data set. I tried using scipy.stats library to fit to a skew normal on the original data, but the fit does

Writing a proper normal log-likelihood in R

阅读更多关于 Writing a proper normal log-likelihood in R

问题 I have a problem regarding the following model, where I want to make inference on μ and tau, u is a known vector and x is the data vector. The log-likelihood is I have a problem writing a log-likelihood in R. x <- c(3.3569,1.9247,3.6156,1.8446,2.2196,6.8194,2.0820,4.1293,0.3609,2.6197) mu <- seq(0,10,length=1000) normal.lik1<-function(theta,x){ u <- c(1,3,0.5,0.2,2,1.7,0.4,1.2,1.1,0.7) mu<-theta[1] tau<-theta[2] n<-length(x) logl <- sapply(c(mu,tau),function(mu,tau){logl<- -0.5*n*log(2*pi) -0