command line utility to print statistics of numbers in linux

后端 未结 16 1626
無奈伤痛
無奈伤痛 2020-11-30 18:46

I often find myself with a file that has one number per line. I end up importing it in excel to view things like median, standard deviation and so forth.

Is there a

16条回答
  •  时光说笑
    2020-11-30 19:15

    This is a breeze with R. For a file that looks like this:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    

    Use this:

    R -q -e "x <- read.csv('nums.txt', header = F); summary(x); sd(x[ , 1])"
    

    To get this:

           V1       
     Min.   : 1.00  
     1st Qu.: 3.25  
     Median : 5.50  
     Mean   : 5.50  
     3rd Qu.: 7.75  
     Max.   :10.00  
    [1] 3.02765
    
    • The -q flag squelches R's startup licensing and help output
    • The -e flag tells R you'll be passing an expression from the terminal
    • x is a data.frame - a table, basically. It's a structure that accommodates multiple vectors/columns of data, which is a little peculiar if you're just reading in a single vector. This has an impact on which functions you can use.
    • Some functions, like summary(), naturally accommodate data.frames. If x had multiple fields, summary() would provide the above descriptive stats for each.
    • But sd() can only take one vector at a time, which is why I index x for that command (x[ , 1] returns the first column of x). You could use apply(x, MARGIN = 2, FUN = sd) to get the SDs for all columns.

提交回复
热议问题