Randomly Pick Lines From a File Without Slurping It With Unix

前端 未结 10 933
忘了有多久
忘了有多久 2020-12-07 11:40

I have a 10^7 lines file, in which I want to choose 1/100 of lines randomly from the file. This is the AWK code I have, but it slurps all the file content before hand. My PC

10条回答
  •  自闭症患者
    2020-12-07 11:47

    if you have that many lines, are you sure you want exactly 1% or a statistical estimate would be enough?

    In that second case, just randomize at 1% at each line...

    awk 'BEGIN {srand()} !/^$/ { if (rand() <= .01) print $0}'
    

    If you'd like the header line plus a random sample of lines after, use:

    awk 'BEGIN {srand()} !/^$/ { if (rand() <= .01 || FNR==1) print $0}'
    

提交回复
热议问题