Count lines in large files

前端 未结 13 2279
挽巷
挽巷 2020-12-02 08:53

I commonly work with text files of ~20 Gb size and I find myself counting the number of lines in a given file very often.

The way I do it now it\'s just cat fn

13条回答
  •  栀梦
    栀梦 (楼主)
    2020-12-02 09:46

    I know the question is a few years old now, but expanding on Ivella's last idea, this bash script estimates the line count of a big file within seconds or less by measuring the size of one line and extrapolating from it:

    #!/bin/bash
    head -2 $1 | tail -1 > $1_oneline
    filesize=$(du -b $1 | cut -f -1)
    linesize=$(du -b $1_oneline | cut -f -1)
    rm $1_oneline
    echo $(expr $filesize / $linesize)
    

    If you name this script lines.sh, you can call lines.sh bigfile.txt to get the estimated number of lines. In my case (about 6 GB, export from database), the deviation from the true line count was only 3%, but ran about 1000 times faster. By the way, I used the second, not first, line as the basis, because the first line had column names and the actual data started in the second line.

提交回复
热议问题