PostgreSQL how split a query between multiple CPU

后端 未结 2 422
南旧
南旧 2021-01-22 04:15

I have a store procedure

DO_STUFF(obj rowFromMyTable) 

This take obj and process some data and save the result in an independent table. So

2条回答
  •  谎友^
    谎友^ (楼主)
    2021-01-22 04:50

    A technique I like to use to get quick multi-threading for queries is to use a combination of psql and GNU Parallel (http://www.gnu.org/software/parallel/parallel_tutorial.html) to allow for multiple psql commands to be run at once.

    If you create a wrapper stored procedure containing the loop and add arguments to it to take an offset and a limit, you can then create a quick bash script (or Python, Perl, et al) to generate the series of psql commands that are needed.

    The file containing the commands can be piped into parallel and either take all the CPUs available, or a number you determine (I often like to use 4 CPUs, so as to also keep a lid on I/O on the box, but it would depend on the hardware you have).

    Let's say the wrapper is called do_stuff_wrapper(_offset, _limit). The offset and limit would apply to the select:

    select obj from tblSOURCE offset _offset limit _limit
    

    Your generated psql command file (let's call it parallel.dat) might look something like this:

    psql -X -h HOST -U user database -c "select do_stuff_wrapper(0, 5000);"
    psql -X -h HOST -U user database -c "select do_stuff_wrapper(5001, 5000);"
    psql -X -h HOST -U user database -c "select do_stuff_wrapper(10001, 5000);"
    

    and so on.

    Then you can run the commands like this:

    cat parallel.dat | parallel -j 4 {}

    To get multiple psql commands running in concert. Parallel will also pipeline the IO (if any, such as NOTICE's, etc.) for you such that it ends up in command order.

    Edit: If you're running on Windows, you could perhaps install Cygwin, and then use parallel from there. Another, pure-Windows option would be to look into Powershell to accomplish something akin to parallel (see Can Powershell Run Commands in Parallel?).

提交回复
热议问题