How to calculate the statistics “t-test” with numpy

后端 未结 3 903
野的像风
野的像风 2020-11-30 05:58

I\'m looking to generate some statistics about a model I created in python. I\'d like to generate the t-test on it, but was wondering if there was an easy way to do this wi

相关标签:
3条回答
  • 2020-11-30 06:22

    Once you get your t-value, you may wonder how to interpret it as a probability -- I did. Here is a function I wrote to help with that.

    It's based on info I gleaned from http://www.vassarstats.net/rsig.html and http://en.wikipedia.org/wiki/Student%27s_t_distribution.

    # Given (possibly random) variables, X and Y, and a correlation direction,
    # returns:
    #  (r, p),
    # where r is the Pearson correlation coefficient, and p is the probability
    # of getting the observed values if there is actually no correlation in the given
    # direction.
    #
    # direction:
    #  if positive, p is the probability of getting the observed result when there is no
    #     positive correlation in the normally distributed full populations sampled by X
    #     and Y
    #  if negative, p is the probability of getting the observed result, when there is no
    #     negative correlation
    #  if 0, p is the probability of getting your result, if your hypothesis is true that
    #    there is no correlation in either direction
    def probabilityOfResult(X, Y, direction=0):
        x = len(X)
        if x != len(Y):
            raise ValueError("variables not same len: " + str(x) + ", and " + \
                             str(len(Y)))
        if x < 6:
            raise ValueError("must have at least 6 samples, but have " + str(x))
        (corr, prb_2_tail) = stats.pearsonr(X, Y)
    
        if not direction:
            return (corr, prb_2_tail)
    
        prb_1_tail = prb_2_tail / 2
        if corr * direction > 0:
            return (corr, prb_1_tail)
    
        return (corr, 1 - prb_1_tail)
    
    0 讨论(0)
  • 2020-11-30 06:24

    In a scipy.stats package there are few ttest_... functions. See example from here:

    >>> print 't-statistic = %6.3f pvalue = %6.4f' %  stats.ttest_1samp(x, m)
    t-statistic =  0.391 pvalue = 0.6955
    
    0 讨论(0)
  • 2020-11-30 06:31

    van's answer using scipy is exactly right and using the scipy.stats.ttest_* functions is very convenient.

    But I came to this page looking for a solution with pure numpy, as stated in the heading, to avoid the scipy dependence. To this end, let me point out the example given here: https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.standard_t.html

    The main Problem is, that numpy does not have cumulative distribution functions, hence my conclusion is that you should really use scipy. Anyway, using only numpy is possible:

    From the original question I am guessing that you want to compare your datasets and judge with a t-test whether there is a significant deviation? Further, that the samples are paired? (See https://en.wikipedia.org/wiki/Student%27s_t-test#Unpaired_and_paired_two-sample_t-tests ) In that case, you can calculate the t- and p-value like so:

    import numpy as np
    sample1 = np.array([55.0, 55.0, 47.0, 47.0, 55.0, 55.0, 55.0, 63.0])
    sample2 = np.array([54.0, 56.0, 48.0, 46.0, 56.0, 56.0, 55.0, 62.0])
    # paired sample -> the difference has mean 0
    difference = sample1 - sample2
    # the t-value is easily computed with numpy
    t = (np.mean(difference))/(difference.std(ddof=1)/np.sqrt(len(difference)))
    # unfortunately, numpy does not have a build in CDF
    # here is a ridiculous work-around integrating by sampling
    s = np.random.standard_t(len(difference), size=100000)
    p = np.sum(s<t) / float(len(s))
    # using a two-sided test
    print("There is a {} % probability that the paired samples stem from distributions with the same means.".format(2 * min(p, 1 - p) * 100))
    

    This will print There is a 73.028 % probability that the paired samples stem from distributions with the same means. Since this is far above any sane confidence interval (say 5%), you should not conclude anything for the concrete case.

    0 讨论(0)
提交回复
热议问题