Shapiro Wilks test is not working in R markdown

落爺英雄遲暮 提交于 2021-02-11 12:23:39

问题


I am working on a piece of r markdown code. I am simply applying a shapiro.wilks test on a data. When I try to run the code in R studio in usual way, I dont get any issue. But when I try to run the code in r markdown, I am getting error provided below:

Error in shapiro.test(Metric) : sample size must be between 3 and 5000 Calls: ,Anonymius> ... summarise -> summarise.tbl_df -> summarise_impl _> shapiro.test In addition: There were 32 warnings (use warnings() to see them)

Warnings are:

Code is:


normality_test_on_data_PPM <- final_combined_data %>% 
                              group_by(PPM) %>% 
                              summarise(W = shapiro.test(Metric)$statistic, P.value = shapiro.test(Metric)$p.value) %>% 
                              ungroup() %>% 
                              mutate(P_Value = format(round(P.value,3), nsmall = 3)) %>% 
                              select(PPM , P_Value) %>%
                              mutate(Normal_test = ifelse(P_Value >= 0.05, "Normal", "Not Normal"))

Result of Normality Check

DT::datatable(normality_test_on_data_PPM)

回答1:


The Shapiro Wilks test admits only sample sizes <= 5000--for good reason, as in very large samples, even minute deviations from normality will qualify as significant at conventional levels. See the discussion here: https://stats.stackexchange.com/questions/446262/can-a-sample-larger-than-5-000-data-points-be-tested-for-normality-using-shapiro. Alternatively, use the Kolmogorov-Smirnov test ks.test, which has no such restriction or, perhaps even better, draw quantile-quantile plots, aka Q-Q plots, by using qqnorm and qqline: if the Q-Q plot deviates from the straight quantile line that's a good diagnostic indicating that the data violate normality.

EDIT: Consider this illustration:

v1 <- rnorm(500)
v2 <- exp(rnorm(500))

par(mfrow = c(1,2), xpd = F)
qqnorm(v1, main = "Q-Q plot", cex.main = 0.85)
qqline(v1, col = "blue")
qqnorm(v2, main = "Q-Q plot", cex.main = 0.85)
qqline(v2, col = "blue")

The resulting plots clearly show which variable is normally, which is not normally distributed:



来源:https://stackoverflow.com/questions/60058926/shapiro-wilks-test-is-not-working-in-r-markdown

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!