How to implement a parallel jobs and queues system

前端 未结 1 1545
心在旅途
心在旅途 2020-12-12 03:41

I spent days trying to implement a parallel jobs and queues system, but... I tried but I can\'t make it. Here is the code without implementing nothing, and CSV example from

1条回答
  •  -上瘾入骨i
    2020-12-12 04:12

    Parallel data processing in PowerShell is not quite simple, especially with queueing. Try to use some existing tools which have this already done. You may take look at the module SplitPipeline. The cmdlet Split-Pipeline is designed for parallel input data processing and supports queueing of input (see the parameter Load). For example, for 4 parallel pipelines with 10 input items each at a time the code will look like this:

    $csv | Split-Pipeline -Count 4 -Load 10, 10 {process{
        
    }} | Out-File $outputReport
    

    All you have to do is to implement the code . Parallel processing and queueing is done by this command.


    UPDATE for the updated question code. Here is the prototype code with some remarks. They are important. Doing work in parallel is not the same as directly, there are some rules to follow.

    $csv | Split-Pipeline -Count 4 -Load 10, 10 -Variable findSize {process{
        # Tips
        # - Operate on input object $_, i.e $_.PCname and $_.User
        # - Use imported variable $findSize
        # - Do not use Write-Host, use (for now) Write-Warning
        # - Do not count issues (for now). This is possible but make it working
        # without this at first.
        # - Do not write data to a file, from several parallel pipelines this
        # is not so trivial, just output data, they will be piped further to
        # the log file
        ...
    }} | Set-Content $report
    # output from all jobs is joined and written to the report file
    

    UPDATE: How to write progress information

    SplitPipeline handled pretty well a 800 targets csv, amazing. Is there anyway to let the user know if the script is alive...? Scan a big csv can take about 20 mins. Something like "in progress 25%","50%","75%"...

    There are several options. The simplest is just to invoke Split-Pipeline with the switch -Verbose. So you will get verbose messages about the progress and see that the script is alive.

    Another simple option is to write and watch verbose messages from the jobs, e.g. Write-Verbose ... -Verbose which will write messages even if Split-Pipeline is invoked without Verbose.

    And another option is to use proper progress messages with Write-Progress. See the scripts:

    • Test-ProgressJobs.ps1
    • Test-ProgressTotal.ps1

    Test-ProgressTotal.ps1 also shows how to use a collector updated from jobs concurrently. You can use the similar technique for counting issues (the original question code does this). When all is done show the total number of issues to a user.

    0 讨论(0)
提交回复
热议问题