statistics

Goodness-of-fit for fixed effect logit model using 'bife' package

99封情书 提交于 2020-01-24 04:19:04
问题 I am using the 'bife' package to run the fixed effect logit model in R. However, I cannot compute any goodness-of-fit to measure the model's overall fit given the result I have below. I would appreciate if I can know how to measure the goodness-of-fit given this limited information. I prefer chi-square test but still cannot find a way to implement this either. --------------------------------------------------------------- Fixed effects logit model with analytical bias-correction Estimated

How to determine the date-and-time that a Linux process was started?

心已入冬 提交于 2020-01-23 21:12:19
问题 If I look at /proc/6945/stat then I get a series of numbers, one of which is the number of CPU-centiseconds for which the process has been running. But I'm running these processes on heavily-loaded boxes, and what I'm interested in is the clock-time when the job will finish, for which I want to know the clock-time that it started. The timestamps on files in /proc/6945 look to be in the right sort of range but I can't find a particular file which consistently has the right clock-time on it. As

xtsum command for R?

▼魔方 西西 提交于 2020-01-23 13:11:25
问题 We're working on panel data, and there is a command in Stata, xtsum , that gives you within and between variance for the variables in the data set. Is there a similar command for R, that produces clean output? 回答1: I have used a little function to do it. The function XTSUM takes three inputs: data -- the dataset varname -- the variable to xtsum unit -- the identifier for the within dimension library(rlang) library(dplyr) XTSUM <- function(data, varname, unit) { varname <- enquo(varname) loc

Ranking Contest Results of Images with 5-Star Ratings

老子叫甜甜 提交于 2020-01-23 09:29:07
问题 I run a calendar photo contest that uses a 5-star rating system which ranks the images according to their average rating. However, I would like to factor in the total number of votes a photo receives to get a more accurate ranking. For example, I do not want an image with 1 5-star vote (Avg rating: 5) getting ranked above an image with 10 5-star votes and 1 4-star vote (Avg rating: 4.9). I know this topic has been raised before, but I can't seem to find a straightforward answer to apply to my

Convert igraph object to a data frame in R

不羁的心 提交于 2020-01-23 04:30:27
问题 I'm working with the iGraph library and I need to run some statistical analysis on the network. I'm computing several variables using iGraph and then want to use those indicators as the dependent variable in a few regressions and the vertex attributes as the independent variables in the model. So, I'm able to load the data, run the igraph analysis, but I'm having trouble turning the igraph object back into a data frame. I don't really need the edges to be preserved, just each vertex to be

Empirical cdf in python similiar to matlab's one

為{幸葍}努か 提交于 2020-01-23 02:40:49
问题 I have some code in matlab, that I would like to rewrite into python. It's simple program, that computes some distribution and plot it in double-log scale. The problem I occured is with computing cdf. Here is matlab code: for D = 1:10 delta = D / 10; for k = 1:n N_delta = poissrnd(delta^-alpha,1); Y_k_delta = ( (1 - randn(N_delta)) / (delta.^alpha) ).^(-1/alpha); Y_k_delta = Y_k_delta(Y_k_delta > delta); X(k) = sum(Y_k_delta); %disp(X(k)) end [f,x] = ecdf(X); plot(log(x), log(1-f)) hold on

Need a R package for piecewise linear regression?

有些话、适合烂在心里 提交于 2020-01-23 01:18:45
问题 Does anybody aware of a package for "piecewise linear regression" ? 回答1: You might also want to check out the breakpoints function in the strucchange package. I've used it when I've had an unknown number of breakpoints. It's easy to use and has good documentation. 回答2: Check out the segmented package 回答3: there's a function called piecewise.linear in the SiZer package. Searching RSeek.org is often a good place to start for instances like this where you want to know if something exists already

Generating means from a bivariate gaussian distribution

倖福魔咒の 提交于 2020-01-23 01:11:05
问题 I am reading Elements of Statistical Learning ESLII and in chapter 2, they have a gaussian mixture data set to illustrate some learning algorithms. To generate this data set, they first generate 10 means from a bivariate gaussian distribution N((1,0)', I). I am not sure what they mean? How can you generate 10 means from a bivariate distribution having mean(1,0)? 回答1: Each of the means that are generated from the bivariate Gaussian distribution are simply single points sampled in exactly the

Johansen cointegration test in python

你离开我真会死。 提交于 2020-01-22 14:50:49
问题 I can't find any reference on funcionality to perform Johansen cointegration test in any Python module dealing eith statistics and time series analysis (pandas and statsmodel). Does anybpdy know if there's some code around that can perform such a test for cointegration among time series? Thanks for your help, Maruizio 回答1: statsmodels doesn't have a Johansen cointegration test. And, I have never seen it in any other python package either. statsmodels has VAR and structural VAR, but no VECM

How to compute summary statistic on Cassandra table with Spark DataFrame?

强颜欢笑 提交于 2020-01-22 03:58:12
问题 I'm trying to get the min, max mean of some Cassandra/SPARK data but I need to do it with JAVA. import org.apache.spark.sql.DataFrame; import static org.apache.spark.sql.functions.*; DataFrame df = sqlContext.read() .format("org.apache.spark.sql.cassandra") .option("table", "someTable") .option("keyspace", "someKeyspace") .load(); df.groupBy(col("keyColumn")) .agg(min("valueColumn"), max("valueColumn"), avg("valueColumn")) .show(); EDITED to show working version: Make sure to put " around the