Bash tool to get nth line from a file

前端 未结 19 2568
刺人心
刺人心 2020-11-22 08:07

Is there a \"canonical\" way of doing that? I\'ve been using head -n | tail -1 which does the trick, but I\'ve been wondering if there\'s a Bash tool that speci

19条回答
  •  广开言路
    2020-11-22 08:56

    According to my tests, in terms of performance and readability my recommendation is:

    tail -n+N | head -1

    N is the line number that you want. For example, tail -n+7 input.txt | head -1 will print the 7th line of the file.

    tail -n+N will print everything starting from line N, and head -1 will make it stop after one line.


    The alternative head -N | tail -1 is perhaps slightly more readable. For example, this will print the 7th line:

    head -7 input.txt | tail -1

    When it comes to performance, there is not much difference for smaller sizes, but it will be outperformed by the tail | head (from above) when the files become huge.

    The top-voted sed 'NUMq;d' is interesting to know, but I would argue that it will be understood by fewer people out of the box than the head/tail solution and it is also slower than tail/head.

    In my tests, both tails/heads versions outperformed sed 'NUMq;d' consistently. That is in line with the other benchmarks that were posted. It is hard to find a case where tails/heads was really bad. It is also not surprising, as these are operations that you would expect to be heavily optimized in a modern Unix system.

    To get an idea about the performance differences, these are the number that I get for a huge file (9.3G):

    • tail -n+N | head -1: 3.7 sec
    • head -N | tail -1: 4.6 sec
    • sed Nq;d: 18.8 sec

    Results may differ, but the performance head | tail and tail | head is, in general, comparable for smaller inputs, and sed is always slower by a significant factor (around 5x or so).

    To reproduce my benchmark, you can try the following, but be warned that it will create a 9.3G file in the current working directory:

    #!/bin/bash
    readonly file=tmp-input.txt
    readonly size=1000000000
    readonly pos=500000000
    readonly retries=3
    
    seq 1 $size > $file
    echo "*** head -N | tail -1 ***"
    for i in $(seq 1 $retries) ; do
        time head "-$pos" $file | tail -1
    done
    echo "-------------------------"
    echo
    echo "*** tail -n+N | head -1 ***"
    echo
    
    seq 1 $size > $file
    ls -alhg $file
    for i in $(seq 1 $retries) ; do
        time tail -n+$pos $file | head -1
    done
    echo "-------------------------"
    echo
    echo "*** sed Nq;d ***"
    echo
    
    seq 1 $size > $file
    ls -alhg $file
    for i in $(seq 1 $retries) ; do
        time sed $pos'q;d' $file
    done
    /bin/rm $file
    

    Here is the output of a run on my machine (ThinkPad X1 Carbon with an SSD and 16G of memory). I assume in the final run everything will come from the cache, not from disk:

    *** head -N | tail -1 ***
    500000000
    
    real    0m9,800s
    user    0m7,328s
    sys     0m4,081s
    500000000
    
    real    0m4,231s
    user    0m5,415s
    sys     0m2,789s
    500000000
    
    real    0m4,636s
    user    0m5,935s
    sys     0m2,684s
    -------------------------
    
    *** tail -n+N | head -1 ***
    
    -rw-r--r-- 1 phil 9,3G Jan 19 19:49 tmp-input.txt
    500000000
    
    real    0m6,452s
    user    0m3,367s
    sys     0m1,498s
    500000000
    
    real    0m3,890s
    user    0m2,921s
    sys     0m0,952s
    500000000
    
    real    0m3,763s
    user    0m3,004s
    sys     0m0,760s
    -------------------------
    
    *** sed Nq;d ***
    
    -rw-r--r-- 1 phil 9,3G Jan 19 19:50 tmp-input.txt
    500000000
    
    real    0m23,675s
    user    0m21,557s
    sys     0m1,523s
    500000000
    
    real    0m20,328s
    user    0m18,971s
    sys     0m1,308s
    500000000
    
    real    0m19,835s
    user    0m18,830s
    sys     0m1,004s
    

提交回复
热议问题