statistics

What is a good statistical math package for .Net? [closed]

偶尔善良 提交于 2019-12-20 08:37:20
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 5 years ago . I am looking for a library that does advanced math, statistics, statistical distribution, etc.. Currently I am looking for something that does binomial and poisson distribution. 回答1: MathDotNet should have the functions you are looking for, although it may be a bit of overkill depending on how much functionality

Variables Overview with xtable in R

落爺英雄遲暮 提交于 2019-12-20 08:30:24
问题 I'm wondering if it's possible to create a xtable from the command str(x) to get an overview from the variables you use. This would be a nice feature to introduce someone to the dataset, but it's annoying to create it by yourself. So whta I tried is to make a xtable like this: str(cars) require(xtable) xtable(str(cars)) the cars dataset is given from R. Unfortunately xtable doesn't give a Latexcode for str() . Is it possible outsmart R here? Here are the main commands that xtable will

Python implementation of the Wilson Score Interval?

霸气de小男生 提交于 2019-12-20 08:11:13
问题 After reading How Not to Sort by Average Rating, I was curious if anyone has a Python implementation of a Lower bound of Wilson score confidence interval for a Bernoulli parameter? 回答1: Reddit uses the Wilson score interval for comment ranking, an explanation and python implementation can be found here #Rewritten code from /r2/r2/lib/db/_sorts.pyx from math import sqrt def confidence(ups, downs): n = ups + downs if n == 0: return 0 z = 1.0 #1.44 = 85%, 1.96 = 95% phat = float(ups) / n return

Python implementation of the Wilson Score Interval?

爷,独闯天下 提交于 2019-12-20 08:11:12
问题 After reading How Not to Sort by Average Rating, I was curious if anyone has a Python implementation of a Lower bound of Wilson score confidence interval for a Bernoulli parameter? 回答1: Reddit uses the Wilson score interval for comment ranking, an explanation and python implementation can be found here #Rewritten code from /r2/r2/lib/db/_sorts.pyx from math import sqrt def confidence(ups, downs): n = ups + downs if n == 0: return 0 z = 1.0 #1.44 = 85%, 1.96 = 95% phat = float(ups) / n return

count number of unique elements in each columns with dplyr in sparklyr

点点圈 提交于 2019-12-20 06:37:20
问题 I'm trying to count the number of unique elements in each column in the spark dataset s. However It seems that spark doesn't recognize tally() k<-collect(s%>%group_by(grouping_type)%>%summarise_each(funs(tally(distinct(.))))) Error: org.apache.spark.sql.AnalysisException: undefined function TALLY It seems that spark doesn't recognize simple r functions either, like "unique" or "length". I can run the code on local data, but when I try to run the exact same code on spark table it doesn't work.

Error in chol.default(Cxx) : the leading minor of order is not positive definite

我是研究僧i 提交于 2019-12-20 06:29:23
问题 I have a quite simple script in R. It loads in two data frames, and then performs rCCA with mixOmics : system('defaults write org.R-project.R force.LANG en_US.UTF-8') ## install.packages("mixOmics") library(mixOmics) TCIA <- read.csv("/Users/kimrants/Desktop/Data_for_R/TCIA", header=TRUE, sep=",", stringsAsFactors=FALSE) TCGA <- read.csv("/Users/kimrants/Desktop/Data_for_R/TCGA", header=TRUE, sep=",", stringsAsFactors=FALSE) # Remove first column (of ID) df_TCGA <- TCGA[,-1] df_TCIA<- TCIA[,

Understanding Markov Chain source code in R

断了今生、忘了曾经 提交于 2019-12-20 06:24:54
问题 The following source code is from a book. Comments are written by me to understand the code better. #================================================================== # markov(init,mat,n,states) = Simulates n steps of a Markov chain #------------------------------------------------------------------ # init = initial distribution # mat = transition matrix # labels = a character vector of states used as label of data-frame; # default is 1, .... k #----------------------------------------------

How to apply Henze-Zirkler's Multivariate Normality Test in Jupyter notebook with rpy2

岁酱吖の 提交于 2019-12-20 05:47:27
问题 I am interested in Applying Henze-Zirkler's Multivariate Normality Test in python 3x and I was wondering if I may do so in python in Jupyter notebook. I have fitted a VAR model with my data and the then I would like to test whether the residuals from this fitted VAR model are normally distributed. How may I do so in Jupyter notebook using python? 回答1: This is another answer since I discover this method later. If you do not want to import the library of R into Python. One may call the output

How to get a normalised slope of a trend

天涯浪子 提交于 2019-12-20 05:40:30
问题 I am analysing the distances of users to userx over 6 weeks in a social network. Note: 'No path' means the two users are not conncted yet (at least by friends of friends). week1 week2 week3 week4 week5 week6 user1 No path No path No path No path 3 1 user2 No path No path No path 5 3 1 user3 5 4 4 4 4 3 userN ... I want to see how well the users connect with userx . For that I initially thought of using the value of regression slope for the interpretation (i.e. the low regression slope, the

Sub setting panel data

孤人 提交于 2019-12-20 04:12:06
问题 Very new, so let me know if this is asking too much. I am trying to sub set panel data, in R, into two different categories; one that has complete information for variables and one that has incomplete information for variables. My data looks like this: Person Year Income Age Sex 1 2003 1500 15 1 1 2004 1700 16 1 1 2005 2000 17 1 2 2003 1400 25 0 2 2004 1900 26 0 2 2005 2000 27 0 What I need to do is go through each column ( not columns 1 and 2 ) and if the data is full for the variable (