statistics

Workflow for statistical analysis and report writing

余生长醉 提交于 2019-12-16 22:20:50
问题 Does anyone have any wisdom on workflows for data analysis related to custom report writing? The use-case is basically this: Client commissions a report that uses data analysis, e.g. a population estimate and related maps for a water district. The analyst downloads some data, munges the data and saves the result (e.g. adding a column for population per unit, or subsetting the data based on district boundaries). The analyst analyzes the data created in (2), gets close to her goal, but sees

R bar chart colours for groups of bars

妖精的绣舞 提交于 2019-12-16 18:07:50
问题 Fairly new to R so sorry if this is a dumb question. I want to plot a bar chart of a lot of data - maybe 100 bars. I want to use colours and spacing to highlight the "groups", so I might have the first 10 bars in blue, a small gap, the next 20 in red, a small gap and so on. I can plot the data fine, but how can I do the colouring and gaps in this way? 回答1: This can be done quite easily with ggplot2 as provided in links by @Arun. With base graphics to set space between bars you can use

Convert character vector to numeric vector in R for value assignment?

拈花ヽ惹草 提交于 2019-12-14 04:10:12
问题 I have: z = data.frame(x1=a, x2=b, x3=c, etc) I am trying to do: for (i in 1:10) { paste(c('N'),i,sep="") -> paste(c('z$x'),i,sep="") } Problems: paste(c('z$x'),i,sep="") yields "z$x1", "z$x1" instead of calling the actual values. I need the expression to be evaluated. I tried as.numeric, eval . Neither seemed to work. paste(c('N'),i,sep="") yields "N1", "N2" . I need the expression to be merely used as name. If I try to assign it a value such as paste(c('N'),5,sep="") -> 5 , ie "N5" -> 5

Gaussian Mixture Model in MATLAB - Calculation of the Empirical Variance Covariance Matrix

爱⌒轻易说出口 提交于 2019-12-14 04:04:56
问题 I am having issues in reconciling some basic theoretical results on Gaussian mixtures and the output of the commands gmdistribution, random in Matlab. Consider a mixture of two independent 3-variate normal distributions with weights 1/2,1/2 . The first distribution A is characterised by mean and variance-covariance matrix equal to muA=[-1.4 3.2 -1.9]; %mean vector rhoA=-0.5; %correlation among components in A sigmaA=[1 rhoA rhoA; rhoA 1 rhoA; rhoA rhoA 1]; %variance-covariance matrix of A The

Python - Statistical distribution

霸气de小男生 提交于 2019-12-14 03:35:32
问题 I'm quite new to python world. Also, I'm not a statistician. I'm in the need to implementing mathematical models developed by mathematicians in a computer science programming language. I've chosen python after some research. I'm comfortable with programming as such (PHP/HTML/javascript). I have a column of values that I've extracted from a MySQL database & in need to calculate the below - 1) Normal distribution of it. (I don't have the sigma & mu values. These need to be calculated too

How to use conditional statement and return value for a function in R?

孤街醉人 提交于 2019-12-14 03:31:42
问题 I have to create a function as: ans(x) which returns the value 2*abs(x), if x is negative, and the value x otherwise. What command could i use? Thanks 回答1: ans <- function(x){ ifelse(x < 0, 2*abs(x), x) } will do. > ans(2) [1] 2 > ans(-2) [1] 4 Explanation: We can use the built-in base R function ifelse() . The logic is pretty simple: ifelse(condition, output if condition is TRUE, output if condition is FALSE) Therefore, ifelse(x < 0, 2*abs(x), x) will do the following: evaluate whether value

Entropy and Information Gain

感情迁移 提交于 2019-12-14 02:17:35
问题 Simple question I hope. If I have a set of data like this: Classification attribute-1 attribute-2 Correct dog dog Correct dog dog Wrong dog cat Correct cat cat Wrong cat dog Wrong cat dog Then what is the information gain of attribute-2 relative to attribute-1? I've computed the entropy of the whole data set: -(3/6)log2(3/6)-(3/6)log2(3/6)=1 Then I'm stuck! I think you need to calculate entropies of attribute-1 and attribute-2 too? Then use these three calculations in an information gain

Testing Skewness in Time Series data using R but getting “Error: NCOL(x) == 1 is not TRUE”

别来无恙 提交于 2019-12-13 21:45:58
问题 I am using the Dow Jones Dataset and I am trying to test skewness. So far this is the code: library(tseries) library(zoo) library(reshape2) library(fBasics) dow = read.table('dow_jones_index.data', header=T, sep=',') # create time series dow <- read.table('dow_jones_index.data', header=T, sep=',', stringsAsFactors = FALSE) # delete $ symbol and coerce to numeric dow$close <- as.numeric(sub("\\$", "",dow$close)) tmp <- dcast(dow, date~stock, value.var = "close") #tmp[,-1] means it's removing

Fitting data to a probability distribution, maybe skew normal?

拟墨画扇 提交于 2019-12-13 20:30:37
问题 I am trying to fit my data to some kind of a probability distribution, so I can then generate random numbers based on the distribution. Below is what the data points look like, with x-axis behind the data values and y-axis the probabilities. Data plot They look like they would fit to a skew normal distribution, with mean around 10^-4. The plot's data is actually binned from an original data set. I tried using scipy.stats library to fit to a skew normal on the original data, but the fit does

Writing a proper normal log-likelihood in R

落爺英雄遲暮 提交于 2019-12-13 20:27:38
问题 I have a problem regarding the following model, where I want to make inference on μ and tau, u is a known vector and x is the data vector. The log-likelihood is I have a problem writing a log-likelihood in R. x <- c(3.3569,1.9247,3.6156,1.8446,2.2196,6.8194,2.0820,4.1293,0.3609,2.6197) mu <- seq(0,10,length=1000) normal.lik1<-function(theta,x){ u <- c(1,3,0.5,0.2,2,1.7,0.4,1.2,1.1,0.7) mu<-theta[1] tau<-theta[2] n<-length(x) logl <- sapply(c(mu,tau),function(mu,tau){logl<- -0.5*n*log(2*pi) -0