Inconsistent skewness results between basic skewness formula, Python and R

白昼怎懂夜的黑 提交于 2021-02-08 07:54:47

问题


The data I'm using is pasted below. When I apply the basic formula for skewness to my data in R:

3*(mean(data) - median(data))/sd(data) 

The result is -0.07949198. I get a very similar result in Python. The median is therefore greater than the mean suggesting the left tail is longer.

However, when I apply the descdist function from the fitdistrplus package, the skewness is 0.3076471 suggesting the right tail is longer. The Scipy function skew again returns a skewness of 0.303.

Can I trust this simple formula which gives me a negative skewness? What is going on here.

Thanks, Oliver

data = c(0.18941565600882029, 1.9861271676300578, -5.2022598870056491, 1.6826411075612353, 1.6826411075612353, -2.9502890173410403, -2.923253150057274, -2.9778296382730454, 0.71202396234488663, 0.71202396234488663, -3.1281373844121529, 1.8326831382748159, -5.2961554710604135, 2.7793190416141234, 0.46922759190417185, 7.0730158730158728, 1.1745152354570636, 2.8142292490118579, 2.037940379403794, 7.0607489597780866, 10.460258249641321, 11.894978479196554, 4.8334682860998655, 1.3884016973125886, 4.0940458015267174, 0.12592959841348539, -0.37022332506203476, 1.9713554987212274, -0.83774145616641893, -1.896978417266187, 6.4340675477239362, -6.4774193548387089, -0.31790393013100438, -4.4193265007320646, 5.7454545454545451, 2.5913432835820895, 0.86190724335591451, 0.95753781950965045, 6.8923556942277697, 1.7650659630606862, -2.4558421851289833, -2.390546528803545, 2.6355029585798815, 0.26983655274888557, 1.5032159264931086, 3.9839506172839503, -5.1404511278195484, -2.2477777777777779, 6.0604444444444443, -0.9691172451489477, 1.1383462670591382, -1.5281319661168078, 4.7775667118950702, 1.2223175965665234, 2.0563555555555553, -3.6153201970443352, -0.35731206188058978, -3.6265094676670238, 1.3053804930332262, -4.4604960677555958, -0.8933514246947083, 0.7622542595019659, 1.3892170651664322, 2.5725258493353031, -0.028006088280060883, 0.8933947772657449, 2.4907086614173228, 3.0914196567862717, 4.4222575516693157, 0.64568527918781726, 0.97095158597662778, -3.7409780775716697, -3.3472636815920396, -0.66307448494453247, -7.0384291725105186, -0.14540612516644474, -0.38161535029004906, 5.1076923076923082, 4.0237516869095806, 1.510099573257468, 1.5064083457526081, -0.025879043600562587, 4.5001414427156998, 3.2326264274061991, 1.0185639229422065, 2.66690518783542, 0.53032015065913374, 1.2117829457364342, 0.60861244019138749, -2.5248049921996878, 1.8666666666666669, -0.32978612415232139, 0.29055999999999998, 1.9150729335494328, 2.2988352745424296, 3.779225265235628, 0.093884800811976657, 1.0097869890616005, 1.2220632081097198, 0.21164401128494487)

回答1:


I don't have access to the packages you mention right now so I can't check which formula they apply, however, you seem to be using Pearson's second skewness coefficient (see wikipedia). The estimator for the sample skewness is given on the same page and is given by the third moment which can be calculated simply by:

> S <- mean((data-mean(data))^3)/sd(data)^3
> S
[1] 0.2984792
> n <- length(data)
> S_alt <- S*n^2/((n-1)*(n-2))
> S_alt
[1] 0.3076471

See the alternative definition on the wiki page which yields the same results as in your example.




回答2:


The skewness is generally defined as the third central moment (at least when it is being used by statisticians.) The Wikipedia skewness page explains why the definition you found is unreliable. (I had never seen that definition.) The code in descdist is easy to review:

moment <- function(data, k) {
        m1 <- mean(data)        # so this is a "central moment"
        return(sum((data - m1)^k)/length(data))
    }
skewness <- function(data) {
            sd <- sqrt(moment(data, 2))
            return(moment(data, 3)/sd^3)}
skewness(data)
#[1] 0.3030131

The version you use is apparently called 'median skewness' or 'non-parametric skewness'. See: https://stats.stackexchange.com/questions/159098/taming-of-the-skew-why-are-there-so-many-skew-functions



来源:https://stackoverflow.com/questions/38782203/inconsistent-skewness-results-between-basic-skewness-formula-python-and-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!