How to use grep efficiently?

后端 未结 2 431
误落风尘
误落风尘 2020-12-07 09:53

I have a large number of small files to be searched. I have been looking for a good de-facto multi-threaded version of grep but could not find anything. How can

相关标签:
2条回答
  • 2020-12-07 10:12

    If you have xargs installed on a multi-core processor, you can benefit from the following just in case someone is interested.

    Environment:

    Processor: Dual Quad-core 2.4GHz
    Memory: 32 GB
    Number of files: 584450
    Total Size: ~ 35 GB
    

    Tests:

    1. Find the necessary files, pipe them to xargs and tell it to execute 8 instances.

    time find ./ -name "*.ext" -print0 | xargs -0 -n1 -P8 grep -H "string" >> Strings_find8
    
    real    3m24.358s
    user    1m27.654s
    sys     9m40.316s
    

    2. Find the necessary files, pipe them to xargs and tell it to execute 4 instances.

    time find ./ -name "*.ext" -print0 | xargs -0 -n1 -P4 grep -H "string" >> Strings
    
    real    16m3.051s
    user    0m56.012s
    sys     8m42.540s
    

    3. Suggested by @Stephen: Find the necessary files and use + instead of xargs

    time find ./ -name "*.ext" -exec grep -H "string" {} \+ >> Strings
    
    real    53m45.438s
    user    0m5.829s
    sys     0m40.778s
    

    4. Regular recursive grep.

    grep -R "string" >> Strings
    
    real    235m12.823s
    user    38m57.763s
    sys     38m8.301s
    

    For my purposes, the first command worked just fine.

    0 讨论(0)
  • 2020-12-07 10:23

    Wondering why -n1 is used below won't it be faster to use a higher value (say -n8? or leave it out so xargs will do the right thing)?

    xargs -0 -n1 -P8 grep -H "string"
    

    Seems it will be more efficient to give each grep that's forked to process on more than one file (I assume -n1 will give only one file name in argv for the grep) -- as I see it, we should be able to give the highest n possible on the system (based on argc/argv max length limitation). So the setup cost of bringing up a new grep process is not incurred more often.

    0 讨论(0)
提交回复
热议问题