statistics | 易学教程

Relative frequency in r by factor

阅读更多关于 Relative frequency in r by factor

问题 I would like to get a table of top 10 absolute and relative frequencies for a variable across other factor variable. I have a dataframe with 3 columns: 1 column is a factor variable, 2nd is other variable I need to count, 3 is logical variable as a constraint. (real database has more than 4mln observations) dtf<-data.frame(c("a","a","b","c","b"),c("aaa","bbb","aaa","aaa","bbb"),c(TRUE,FALSE,TRUE,TRUE,TRUE)) colnames(dtf)<-c("factor","var","log") dtf factor var log 1 a aaa TRUE 2 a bbb FALSE 3

How to calculate with the Poisson-Distribution in Matlab?

阅读更多关于 How to calculate with the Poisson-Distribution in Matlab?

问题 I’ve used Excel in the past but the calculations including the Poisson-Distribution took a while, that’s why I switched to SQL. Soon I’ve recognized that SQL might not be a proper solution to deal with statistical issues. Finally I’ve decided to switch to Matlab but I’m not used to it at all, my problem Is the following: I’ve imported a .csv-table and have two columns with values, let’s say A and B (110 x 1 double) These values both are the input values for my Poisson-calculations. Since I

How to calculate with the Poisson-Distribution in Matlab?

阅读更多关于 How to calculate with the Poisson-Distribution in Matlab?

MySQL store checksum of tables in another table

阅读更多关于 MySQL store checksum of tables in another table

问题 CONTEXT: we have big databases with loads of tables. Most of them (99%) are using innodb. we want to have a daily process that monitors which table has been modified. As they use innodb the value of Update_time from SHOW table STATUS from information_schema; is null. For that reason we want to create a daily procedure that will store the checksum (and other stuffs for that matters) of each table somewhere (preferably another table). On that, we will do different checks. PROBLEM: I'm trying to

Post-Hoc tests for chi-sq in R

阅读更多关于 Post-Hoc tests for chi-sq in R

问题 I have a table that looks like this. > dput(theft_loc) structure(c(13704L, 14059L, 14263L, 14450L, 14057L, 15503L, 14230L, 16758L, 15289L, 15499L, 16066L, 15905L, 18531L, 19217L, 12410L, 13398L, 13308L, 13455L, 13083L, 14111L, 13068L, 19569L, 18771L, 19626L, 20290L, 19816L, 20923L, 20466L, 20517L, 19377L, 20035L, 20504L, 20393L, 22409L, 22289L, 7997L, 8106L, 7971L, 8437L, 8246L, 9090L, 8363L, 7934L, 7874L, 7909L, 8150L, 8191L, 8746L, 8277L, 27194L, 25220L, 26034L, 27080L, 27334L, 30819L,

vectorized indexing/slicing in numpy/scipy?

阅读更多关于 vectorized indexing/slicing in numpy/scipy?

问题 I have an array A, and I have a list of slicing indices (s,t), let's called this list L. I want to find the 85 percentiles of A[s1:t1], A[s2:t2] ... Is there a way to vectorize these operations in numpy? ans = [] for (s,t) in L: ans.append( numpy.percentile( A[s:t], 85) ); looks cumbersome. Thanks a lot! PS: it's safe to assume s1 < s2 .... t1 < t2 ..... This is really just a sliding window percentile problem. 回答1: Given that you're dealing with a non-uniform interval (i.e. the slices aren't

Calculate pvalue from pandas DataFrame

阅读更多关于 Calculate pvalue from pandas DataFrame

问题 I have a DataFrame stats with a Multindex and 8 samples (only two shown here) and 8 genes for each sample. In[13]:stats Out[13]: ARG/16S \ count mean std min sample gene Arnhem IC 11.0 2.319050e-03 7.396130e-04 1.503150e-03 Int1 11.0 7.243040e+00 6.848327e+00 1.364879e+00 Sul1 11.0 3.968956e-03 9.186019e-04 2.499074e-03 TetB 2.0 1.154748e-01 1.627663e-01 3.816936e-04 TetM 4.0 1.083125e-04 5.185259e-05 5.189226e-05 blaOXA 4.0 4.210963e-06 3.783235e-07 3.843571e-06 ermB 4.0 4.111081e-05 7

Calculate pvalue from pandas DataFrame

阅读更多关于 Calculate pvalue from pandas DataFrame

What's the best way of implementing a 'popular content' display?

阅读更多关于 What's the best way of implementing a 'popular content' display?

问题 How do I show a list of 'most popular (articles|posts|whatever) for a period such as the past day? (Essentially replicate the functionality of the Radioactivity Drupal module.) 回答1: Here's what I would do: If you're not already, sign up for Google Analytics and add the google analytics javascript to each of your pages. This will track view count for you. Using the google data API library, fetch the information you want. For example, you could ask for the most popular pages on your site in the

How to create a discrete normal distribution in R?

阅读更多关于 How to create a discrete normal distribution in R?

问题 I am trying to create a discrete normal distribution using something such as x <- rnorm(1000, mean = 350, sd = 20) but I don't think the rnorm function has a built in "discrete numbers only" option. I have spent a few hours trying to search this on StackOverflow, Google and R documentation but have yet to find anything. 回答1: Obviously, there is no discrete normal distribution as by default it is continuous. However, as mentioned here (Wikipedia is not the best possible source but this is