statistics

R: How to remove quotation marks in a vector of strings, but maintain vector format as to call each individual value?

我怕爱的太早我们不能终老 提交于 2020-01-07 08:00:23
问题 I want to create a vector of names that act as variable names so I can then use themlater on in a loop. years=1950:2012 for(i in 1:length(years)) { varname[i]=paste("mydata",years[i],sep="") } this gives: > [1] "mydata1950" "mydata1951" "mydata1952" "mydata1953" "mydata1954" "mydata1955" "mydata1956" "mydata1957" "mydata1958" [10] "mydata1959" "mydata1960" "mydata1961" "mydata1962" "mydata1963" "mydata1964" "mydata1965" "mydata1966" "mydata1967" [19] "mydata1968" "mydata1969" "mydata1970"

SQL - How to find optimal performance numbers for query

寵の児 提交于 2020-01-07 04:20:35
问题 First time here so forgive me for any faux pas. I have a question about the limitation of SQL as I am new to the code, and what I need I believe to be rather complex. Is it possible to automate finding the optimal data for a specific query. For example, say I have the following columns: 1) Vehicle type (Text) e.g. car,bike,bus 2) Number of passengers (Numeric) e.g. 0-7 3) Was in an accident (Boolean) e.g. t or f From here, I would like to get percentages. So if I were to select only cars with

statistics wiht large amount of data in C++ or Scilab or Octave or R

≡放荡痞女 提交于 2020-01-07 04:20:10
问题 I recently need to calculate the mean and standard deviation of a large number (about 800,000,000) of doubles. Considering that a double takes 8 bytes, if all the doubles are read into ram, it will take about 6 GB. I think I can use a divide and conquer approach with C++ or other high level languages, but that seems tedious. Is there a way that I can do this all at once with high level languages like R, Scilab or Octave? Thanks. 回答1: It sounds like you could use R-Grid or Hadoop to good

MiniBatchKMeans gives different centroids after subsequent iterations

巧了我就是萌 提交于 2020-01-07 02:54:53
问题 I am using the MiniBatchKMeans model from the sklearn.cluster module in anaconda. I am clustering a data-set that contains approximately 75,000 points. It looks something like this: data = np.array([8,3,1,17,5,21,1,7,1,26,323,16,2334,4,2,67,30,2936,2,16,12,28,1,4,190...]) I fit the data using the process below. from sklearn.cluster import MiniBatchKMeans kmeans = MiniBatchKMeans(batch_size=100) kmeans.fit(data.reshape(-1,1) This is all well and okay, and I proceed to find the centroids of the

How to get MSE of ARIMA model in SAS?

自古美人都是妖i 提交于 2020-01-06 20:10:41
问题 I am comparing two models, one with exponential smoothing and one with ARIMA. For this specific assignment, it's enough that I compare the MSE of the two models. So how do I compute the MSE of the ARIMA procedure? This is the last assignment on this grueling course, help would be greatly appreciated! 回答1: proc arima does not specifically output the MSE, but proc model does. You can recreate the ARIMA model using proc model and the %AR and %MA macros. proc model data=have; endo y; id date; y =

regarding the failure of stepwise variable selection in lm

爱⌒轻易说出口 提交于 2020-01-06 17:29:45
问题 I built a regression model using all the variables at first. full.model<-lm(y~as.matrix(x)) Then I tried to use step-wise variable selection reduce.model<-step(full.model,direction="backward") The running result is shown as follows, looks like it does not do anything. What is the problem of this scenario. I also include the detail of full.model in the following. > reduce.model<-step(full.model,direction="backward") Start: AIC=-121.19 y ~ as.matrix(x) Df Sum of Sq RSS AIC <none> 1.1 -121.19 -

regarding the failure of stepwise variable selection in lm

牧云@^-^@ 提交于 2020-01-06 17:28:31
问题 I built a regression model using all the variables at first. full.model<-lm(y~as.matrix(x)) Then I tried to use step-wise variable selection reduce.model<-step(full.model,direction="backward") The running result is shown as follows, looks like it does not do anything. What is the problem of this scenario. I also include the detail of full.model in the following. > reduce.model<-step(full.model,direction="backward") Start: AIC=-121.19 y ~ as.matrix(x) Df Sum of Sq RSS AIC <none> 1.1 -121.19 -

How to implement a function of a random variable in PyMC which could be sampled by MCMC Metropolis?

徘徊边缘 提交于 2020-01-06 15:05:54
问题 If you have a random variable $X$ and a function $f$, you can define $y=f(X)$ as a new random variable with a probability density function as follows: $p(y)=(f^{-1})'(y)p(x)$. For details see here. Now I have defined a random variable alpha, with an exponential distribution in the following code. I want to add to my model, log(alpha) as a new random variable. How should I implement it in my model? I already made an effort but it seems that it is wrong, and the reason as been pointed out in

How to implement a function of a random variable in PyMC which could be sampled by MCMC Metropolis?

早过忘川 提交于 2020-01-06 15:04:50
问题 If you have a random variable $X$ and a function $f$, you can define $y=f(X)$ as a new random variable with a probability density function as follows: $p(y)=(f^{-1})'(y)p(x)$. For details see here. Now I have defined a random variable alpha, with an exponential distribution in the following code. I want to add to my model, log(alpha) as a new random variable. How should I implement it in my model? I already made an effort but it seems that it is wrong, and the reason as been pointed out in

How to implement a function of a random variable in PyMC which could be sampled by MCMC Metropolis?

一笑奈何 提交于 2020-01-06 15:03:41
问题 If you have a random variable $X$ and a function $f$, you can define $y=f(X)$ as a new random variable with a probability density function as follows: $p(y)=(f^{-1})'(y)p(x)$. For details see here. Now I have defined a random variable alpha, with an exponential distribution in the following code. I want to add to my model, log(alpha) as a new random variable. How should I implement it in my model? I already made an effort but it seems that it is wrong, and the reason as been pointed out in