How can I split a range of values among a “pool” of threads?

随声附和 提交于 2019-12-13 02:25:44

问题


I have some code that processes around 30,000 records. The basic outline is like this:

startRecordID = 2345;
endRecordID = 32345;
for(recordID=startRecordID; recordID <= endRecordID; recordID++){
    // process record...
}

Now, this processing takes a long time, and I'd like to have a thread pool of 15 threads and give each thread a list of recordIDs to process, and then join them all at the end.

In the past I accomplished this with code that looked something like this, where recordLists was an array of sub-arrays each containing 1/15 of the records to be processed:

<cfset numThreads = 15 />
<!--- keep a running list of threads so we can join them all at the end --->
<cfset threadlist = "" />
<cfloop from="1" to="#numThreads#" index="threadNum">
    <cfset threadName = "recordProcessing_#threadNum#" />
    <cfset threadlist = listAppend(threadlist, threadName) />
    <cfthread action="run" name="#threadName#" recordList="#recordList[threadNum]#">
        <cfloop from="1" to="#ArrayLen(recordList)#" index="recordIndex">
            <cfset recordID = recordList[recordIndex] />
            ... process recordID ...
        </cfloop>
    </cfthread> 
</cfloop>
<!--- Join all threads before continuing --->
<cfthread action="join" name="#threadlist#" timeout="4000"/>

This worked well (although I would also convert this old code to cfscript :) ), but to create the recordLists array of sub-arrays is not so simple... The way I can think of to do it would be to loop through the numbers from startRecordID-endRecordID, add each to an array, then run an ArrayDivide function (that we have already defined in our codebase) on it to split it into numThreads (in this case 15) equal sub-arrays. Considering that I have the start of the range, the end of the range, and the number of threads I want to divide it among, isn't there a simpler way to break it up and assign it to the threads?


回答1:


(From comments ..)

If you already have an array, why loop through it again? There are no built in functions, but since an array is a java List, a simple yourArray.subList(startIndex, endIndex) would do the trick. Obviously add some error handling in case the number of records is less than the number of processing threads.

NB: Since it is a java method, indexes start at zero (0) and the endIndex is exclusive. Also, the result is like a CF array in most respects. However, it is immutable ie cannot be modified.

<cfscript>
    // calculate how many records to process in each batch
    numOfIterations = 15;
    totalRecords = arrayLen(recordsArray);
    batchSize = ceiling(totalRecords/numOfIterations);


    for (t=0; t < numOfIterations; t++) {
        // calculate sub array positions
        startAt = t * batchSize;
        endAt   = Min(startAt+batchSize, totalRecords);

        // get next batch of records
        subArray = recordsArray.subList(startAt, endAt);

        // kick off a thread and do whatever you want with the array ...
        WriteOutput("<br>Batch ["& t &"] startAt="& startAt &" endAt="& endAt); 
    }
</cfscript>


来源:https://stackoverflow.com/questions/28740376/how-can-i-split-a-range-of-values-among-a-pool-of-threads

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!