cURL Mult Simultaneous Requests (domain check)

匿名 (未验证) 提交于 2019-12-03 02:33:02

问题:

I'm trying to take a list of 20,000 + domain names and check if they are "alive". All I really need is a simple http code check but I can't figure out how to get that working with curl_multi. On a separate script I'm using I have the following function which simultaneously checks a batch of 1000 domains and returns the json response code. Maybe this can be modified to just get the http response code instead of the page content?

(sorry about the syntax I couldn't get it to paste as a nice block of code without going line by line and adding 4 spaces...(also tried skipping a line and adding 8 spaces)

$dotNetRequests = array of domains...

//loop through arrays foreach(array_chunk($dotNetRequests, 1000) as $Netrequests) {     $results = checkDomains($Netrequests);     $NetcurlRequest = array_merge($NetcurlRequest, $results); }  function checkDomains($data) {  // array of curl handles $curly = array(); // data to be returned $result = array();  // multi handle $mh = curl_multi_init();  // loop through $data and create curl handles // then add them to the multi-handle foreach ($data as $id => $d) {  $curly[$id] = curl_init();  $url = (is_array($d) && !empty($d['url'])) ? $d['url'] : $d; curl_setopt($curly[$id], CURLOPT_URL,            $url); curl_setopt($curly[$id], CURLOPT_HEADER,         0); curl_setopt($curly[$id], CURLOPT_RETURNTRANSFER, 1);  // post? if (is_array($d)) {   if (!empty($d['post'])) {     curl_setopt($curly[$id], CURLOPT_POST,       1);     curl_setopt($curly[$id], CURLOPT_POSTFIELDS, $d['post']);   } }  curl_multi_add_handle($mh, $curly[$id]);   }    // execute the handles   $running = null;   do {     curl_multi_exec($mh, $running);   } while($running > 0);    // get content and remove handles   foreach($curly as $id => $c) {      // $result[$id] = curl_multi_getcontent($c); // if($result[$id]) { if (curl_multi_getcontent($c)){     //echo "yes";     $netName = $data[$id];     $dName = str_replace(".net", ".com", $netName);     $query = "Update table1 SET dotnet = '1' WHERE Domain = '$dName'";     mysql_query($query); } curl_multi_remove_handle($mh, $c);  }  // all done  curl_multi_close($mh);  return $result; }  

回答1:

In any other language you would thread this kind of operation ...

https://github.com/krakjoe/pthreads

And you can in PHP too :)

I would suggest a few workers rather than 20,000 individual threads ... not that 20,000 threads is out of the realms of possibility - it isn't ... but that wouldn't be a good use of resources, I would do as you are now and have 20 workers getting the results of 1000 domains each ... I assume you don't need me to give the example of getting a response code, I'm sure curl would give it to you, but it's probably overkill to use curl being that you do not require it's threading capabilities: I would fsockopen port 80, fprintf GET HTTP/1.0/\n\n, fgets the first line and close the connection ... if you're going to be doing this all the time then I would also use Connection: close so that the receiving machines are not holding connections unnecessary ...



回答2:

This script works great for handling bulk simultaneous cURL requests using PHP. I'm able to parse through 50k domains in just a few minutes using it!

https://github.com/petewarden/ParallelCurl/



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!