To fork or not to fork?

孤街浪徒 提交于 2019-11-29 13:51:37
user1126070

You could use Thread::Queue or any other from this: Is there a multiprocessing module for Perl?

If the old system was written in Perl this way you could reuse most part of it.

Non working example:

use strict;
use warnings;

use threads;
use Thread::Queue;

my $q = Thread::Queue->new();    # A new empty queue

# Worker thread
my @thrs = threads->create(sub {
                            while (my $item = $q->dequeue()) {
                                # Do work on $item
                            }
                         })->detach() for 1..10;#for 10 threads
my $dbh = ...
while (1){
  #get items from db
  my @items = get_items_from_db($dbh);
  # Send work to the thread
  $q->enqueue(@items);
  print "Pending items: "$q->pending()."\n";
  sleep 15;#check DB in every 15 secs
}

I would suggest using a message queue server like RabbitMQ.

One process feeds work into the queue, and you can have multiple worker processes consume the queue.

Advantages of this approach:

  • workers block when waiting for work (no busy waiting)
  • more worker processes can be started up manually if needed
  • worker processes don't have to be a child of a special parent process
  • RabbitMQ will distribute the work among all workers which are ready to accept work
  • RabbitMQ will put work back into the queue if the worker doesn't return an ACK
  • you don't have to assign work in the database
  • every "agent" (worker, producer, etc.) is an independent process which means you can kill it or restart it without affecting other processes

To dynamically scale-up or down the number workers, you can implement something like:

  1. have workers automatically die if they don't get work for a specified amount of time
  2. have another process monitor the length of the queue and spawn more workers if the queue is getting too big

I would recommend using beanstalkd for a dedicated job server, and Beanstalk::Client in your perl scripts for adding jobs to the queue and removing them.

You should find beanstalkd easier to install and set up compared to RabbitMQ. It will also take care of distributing jobs among available workers, burying any failed jobs so they can be retried later, scheduling jobs to be done at a later date, and many more basic features. For your worker, you don't have to worry about forking or threading; just start up as many workers as you need, on as many servers as you have available.

Either RabbitMQ or Beanstalk would be better than rolling your own db-backed solution. These projects have already worked out many of the details needed for queueing, and implemented features you may not realize yet that you want. They should also handle polling for new jobs more efficiently, compared to sleeping and selecting from your database to see if there's more work to do.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!