Randomly Pick Lines From a File Without Slurping It With Unix

前端 未结 10 972
忘了有多久
忘了有多久 2020-12-07 11:40

I have a 10^7 lines file, in which I want to choose 1/100 of lines randomly from the file. This is the AWK code I have, but it slurps all the file content before hand. My PC

10条回答
  •  无人及你
    2020-12-07 11:51

    I don't know awk, but there is a great technique for solving a more general version of the problem you've described, and in the general case it is quite a lot faster than the for line in file return line if rand < 0.01 approach, so it might be useful if you intend to do tasks like the above many (thousands, millions) of times. It is known as reservoir sampling and this page has a pretty good explanation of a version of it that is applicable to your situation.

提交回复
热议问题