How do you make good use of multicore CPUs in your PHP/MySQL applications?

后端 未结 5 1841
春和景丽
春和景丽 2020-12-02 07:51

I maintain a custom built CMS-like application.

Whenever a document is submitted, several tasks are performed that can be roughly grouped into the following categori

5条回答
  •  鱼传尺愫
    2020-12-02 08:06

    Introduction

    PHP has full Multi-Threading support which you can take full advantage of in so many ways. Have been able to demonstrate this Multi-Threading ability in different examples:

    • How can one use multi threading in PHP applications
    • pcntl runs the same code several times, assistance required
    • Improving HTML scraper efficiency with pcntl_fork()

    A quick Search would give additional resources.

    Categories

    1: MySQL queries

    MySQL is fully multi-threaded and will make use of multiple CPUs, provided that the operating system supports them, It would also maximize system resources if properly configured for performance.

    A typical setting in the my.ini that affect thread performance is :

    thread_cache_size = 8
    

    thread_cache_size can be increased to improve performance if you have a lot of new connections. Normally, this does not provide a notable performance improvement if you have a good thread implementation. However, if your server sees hundreds of connections per second you should normally set thread_cache_size high enough so that most new connections use cached threads

    If you are using Solaris then you can use

    thread_concurrency = 8 
    

    thread_concurrency enables applications to give the threads system a hint about the desired number of threads that should be run at the same time.

    This variable is deprecated as of MySQL 5.6.1 and is removed in MySQL 5.7. You should remove this from MySQL configuration files whenever you see it unless they are for Solaris 8 or earlier.

    InnoDB: :

    You don't have such limitations if you are using Innodb has the storage engine because it full supports thread concurrency

    innodb_thread_concurrency //  Recommended 2 * CPUs + number of disks
    

    You can also look at innodb_read_io_threads and innodb_write_io_threads where the default is 4 and it can be increased to as high as 64 depending on the hardware

    Others:

    Other configurations to also look at include key_buffer_size , table_open_cache, sort_buffer_size etc. which cal all result in better performance

    PHP:

    In pure PHP you can create MySQL Worker where each query are executed in separate PHP threads

    $sql = new SQLWorker($host, $user, $pass, $db);
    $sql->start();
    
    $sql->stack($q1 = new SQLQuery("One long Query")); 
    $sql->stack($q2 = new SQLQuery("Another long Query"));
    
    $q1->wait(); 
    $q2->wait(); 
    
    // Do Something Useful
    

    Here is a Full Working Example of SQLWorker

    2: HTML content parsing

    I suspect that a great deal of computation time is spent in this task.

    If you already know the problem then it makes it easier to solve via event loops , Job Queue or using Threads.

    Working on one document one at a time can be a very, very slow, painful process. @ka once hacked his way out using ajax to calling multiple request, Some Creative minds would just fork the process using pcntl_fork but if you are using windows then you can not take advantage of pcntl

    With pThreads supporting both windows and Unix systems, You don't have such limitation. Is as easy as .. If you need to parse 100 document? Spawn 100 Threads ... Simple

    HTML Scanning

    // Scan my System
    $dir = new RecursiveDirectoryIterator($dir, RecursiveDirectoryIterator::SKIP_DOTS);
    $dir = new RecursiveIteratorIterator($dir);
    
    // Allowed Extension
    $ext = array(
            "html",
            "htm"
    );
    
    // Threads Array
    $ts = array();
    
    // Simple Storage
    $s = new Sink();
    
    // Start Timer
    $time = microtime(true);
    
    $count = 0;
    // Parse All HTML
    foreach($dir as $html) {
        if ($html->isFile() && in_array($html->getExtension(), $ext)) {
            $count ++;
            $ts[] = new LinkParser("$html", $s);
        }
    }
    
    // Wait for all Threads to finish
    foreach($ts as $t) {
        $t->join();
    }
    
    // Put The Output
    printf("Total Files:\t\t%s \n", number_format($count, 0));
    printf("Total Links:\t\t%s \n", number_format($t = count($s), 0));
    printf("Finished:\t\t%0.4f sec \n", $tm = microtime(true) - $time);
    printf("AvgSpeed:\t\t%0.4f sec per file\n", $tm / $t);
    printf("File P/S:\t\t%d file per sec\n", $count / $tm);
    printf("Link P/S:\t\t%d links per sec\n", $t / $tm);
    

    Output

    Total Files:            8,714
    Total Links:            105,109
    Finished:               108.3460 sec
    AvgSpeed:               0.0010 sec per file
    File P/S:               80 file per sec
    Link P/S:               907 links per sec
    

    Class Used

    Sink

    class Sink extends Stackable {
        public function run() {
        }
    }
    

    LinkParser

    class LinkParser extends Thread {
    
        public function __construct($file, $sink) {
            $this->file = $file;
            $this->sink = $sink;
            $this->start();
        }
    
        public function run() {
            $dom = new DOMDocument();
            @$dom->loadHTML(file_get_contents($this->file));
            foreach($dom->getElementsByTagName('a') as $links) {
                $this->sink[] = $links->getAttribute('href');
            }
        }
    }
    

    Experiment

    Trying parsing 8,714 files that have 105,109 links without threads and see how long it would take.

    Better Architecture

    Spawning too many threads which is not a clever thing to do In production. A better approch would be to use Pooling. Have a pool of define Workers then stack with a Task

    Performance Improvement

    Fine, the example above can still be improved. Instead of waiting for the system to scan all files in a single thread you can use multiple threads to scan my system for files then stack the data to Workers for processing

    3: Search index updating

    This has been pretty much answered by the first answer, but there are so many ways for performance improvement. Have you ever considered an Event based approach?

    Introducing Event

    @rdlowrey Quote 1:

    Well think of it like this. Imagine you need to serve 10,000 simultaneously connected clients in your web application. Traditional thread-per-request or process-per-request servers aren't an option because no matter how lightweight your threads are you still can't hold 10,000 of them open at a time.

    @rdlowrey Quote 2:

    On the other hand, if you keep all the sockets in a single process and listen for those sockets to become readable or writable you can put your entire server inside a single event loop and operate on each socket only when there's something to read/write.

    Why don't you experiment with event-driven, non-blocking I/O approach to your problem. PHP has libevent to supercharge your application.

    I know this question is all Multi-Threading but if you have some time you can look this Nuclear Reactor written in PHP by @igorw

    Finally

    Consideration

    I think you should consider using Cache and Job Queue for some of your tasks. You can easily have a message saying

    Document uploaded for processing ..... 5% - Done   
    

    Then do all the time wasting tasks in the background. Please look at Making a large processing job smaller for a similar case study.

    Profiling

    Profiling Tool? There is no single profile tool for a web application from Xdebug to Yslow are all very useful. Eg. Xdebug is not useful when it comes to threads because its not supported

    I don't have a favorite

提交回复
热议问题