contingency

p-value from fisher.test() does not match phyper()

感情迁移 提交于 2021-02-19 04:42:16
问题 The Fisher's Exact Test is related to the hypergeometric distribution, and I would expect that these two commands would return identical pvalues. Can anyone explain what I'm doing wrong that they do not match? #data (variable names chosen to match dhyper() argument names) x = 14 m = 20 n = 41047 k = 40 #Fisher test, alternative = 'greater' (fisher.test(matrix(c(x, m-x, k-x, n-(k-x)),2,2), alternative='greater'))$p.value #returns 2.01804e-39 #geometric distribution, lower.tail = F, i.e. P[X >

p-value from fisher.test() does not match phyper()

流过昼夜 提交于 2021-02-19 04:42:05
问题 The Fisher's Exact Test is related to the hypergeometric distribution, and I would expect that these two commands would return identical pvalues. Can anyone explain what I'm doing wrong that they do not match? #data (variable names chosen to match dhyper() argument names) x = 14 m = 20 n = 41047 k = 40 #Fisher test, alternative = 'greater' (fisher.test(matrix(c(x, m-x, k-x, n-(k-x)),2,2), alternative='greater'))$p.value #returns 2.01804e-39 #geometric distribution, lower.tail = F, i.e. P[X >

Pyspark: reshape data without aggregation

泪湿孤枕 提交于 2020-12-26 05:00:28
问题 I want to reshape my data from 4x3 to 2x2 in pyspark without aggregating. My current output is the following: columns = ['FAULTY', 'value_HIGH', 'count'] vals = [ (1, 0, 141), (0, 0, 140), (1, 1, 21), (0, 1, 12) ] What I want is a contingency table with the second column as two new binary columns ( value_HIGH_1 , value_HIGH_0 ) and the values from the count column - meaning: columns = ['FAULTY', 'value_HIGH_1', 'value_HIGH_0'] vals = [ (1, 21, 141), (0, 12, 140) ] 回答1: You can use pivot with

Pyspark: reshape data without aggregation

寵の児 提交于 2020-12-26 05:00:05
问题 I want to reshape my data from 4x3 to 2x2 in pyspark without aggregating. My current output is the following: columns = ['FAULTY', 'value_HIGH', 'count'] vals = [ (1, 0, 141), (0, 0, 140), (1, 1, 21), (0, 1, 12) ] What I want is a contingency table with the second column as two new binary columns ( value_HIGH_1 , value_HIGH_0 ) and the values from the count column - meaning: columns = ['FAULTY', 'value_HIGH_1', 'value_HIGH_0'] vals = [ (1, 21, 141), (0, 12, 140) ] 回答1: You can use pivot with

Is there an (easy) way to convert flat contingency tables (ftable) to flextable

社会主义新天地 提交于 2020-08-11 05:57:45
问题 I used to create FlexTable-objects from ‘flat’ contingency tables (ftable, stats-package) based on the old packages reporteRs and rtable. Before these packages became obsolete and were removed from CRAN, there has been a function as.Flextable.ftable , which did the trick. --> See: https://rdrr.io/cran/rtable/man/as.FlexTable.ftable.html Is there a way to achieve this conversion for the new flextable package? I couldn't find a similar function yet. 回答1: that a very good question. The migration

Is there an (easy) way to convert flat contingency tables (ftable) to flextable

风流意气都作罢 提交于 2020-08-11 05:57:14
问题 I used to create FlexTable-objects from ‘flat’ contingency tables (ftable, stats-package) based on the old packages reporteRs and rtable. Before these packages became obsolete and were removed from CRAN, there has been a function as.Flextable.ftable , which did the trick. --> See: https://rdrr.io/cran/rtable/man/as.FlexTable.ftable.html Is there a way to achieve this conversion for the new flextable package? I couldn't find a similar function yet. 回答1: that a very good question. The migration