Randomly Pick Lines From a File Without Slurping It With Unix

前端 未结 10 934
忘了有多久
忘了有多久 2020-12-07 11:40

I have a 10^7 lines file, in which I want to choose 1/100 of lines randomly from the file. This is the AWK code I have, but it slurps all the file content before hand. My PC

10条回答
  •  -上瘾入骨i
    2020-12-07 12:01

    If the aim is just to avoid memory exhaustion, and the file is a regular file, no need to implement reservoir sampling. The number of lines in the file can be known if you do two passes in the file, one to get the number of lines (like with wc -l), one to select the sample:

    file=/some/file
    awk -v percent=0.01 -v n="$(wc -l < "$file")" '
      BEGIN {srand(); p = int(n * percent)}
      rand() * n-- < p {p--; print}' < "$file"
    

提交回复
热议问题