PhantomJS using too many threads

大憨熊 提交于 2020-01-01 00:44:11

问题


I wrote a PhantomJS app to crawl over a site I built and check for a JavaScript file to be included. The JavaScript is similar to Google where some inline code loads in another JS file. The app looks for that other JS file which is why I used Phantom.

What's the expected result?

The console output should read through a ton of URLs and then tell if the script is loaded or not.

What's really happening?

The console output will read as expected for about 50 requests and then just start spitting out this error:

2013-02-21T10:01:23 [FATAL] QEventDispatcherUNIXPrivate(): Can not continue without a thread pipe
QEventDispatcherUNIXPrivate(): Unable to create thread pipe: Too many open files

This is the block of code that opens a page and searches for the script include:

page.open(url, function (status) {
    console.log(YELLOW, url, status, CLEAR);
    var found =  page.evaluate(function () {
      if (document.querySelectorAll("script[src='***']").length) {
        return true;
      } else { return false; }
    });

    if (found) {
      console.log(GREEN, 'JavaScript found on', url, CLEAR);
    } else {
      console.log(RED, 'JavaScript not found on', url, CLEAR);
    }
    self.crawledURLs[url] = true;
    self.crawlURLs(self.getAllLinks(page), depth-1);
  });

The crawledURLs object is just an object of urls that I've already crawled. The crawlURLs function just goes through the links from the getAllLinks function and calls the open function on all links that have the base domain of the domain that the crawler started on.

Edit

I modified the last block of the code to be as follows, but still have the same issue. I have added page.close() to the file.

if (!found) {
  console.log(RED, 'JavaScript not found on', url, CLEAR);
}
self.crawledURLs[url] = true;
var links = self.getAllLinks(page);
page.close();
self.crawlURLs(links, depth-1);

回答1:


From the documentation:

Due to some technical limitations, the web page object might not be completely garbage collected. This is often encountered when the same object is used over and over again.

The solution is to explicitly call close() of the web page object (i.e. page in many cases) at the right time.

Some included examples, such as follow.js, demonstrate multiple page objects with explicit close.




回答2:


Open Files Limit.

Even with closing files properly, you might still run into this error.

After scouring the internets I discovered that you need to increase your limit of the number of files a single process is allowed to have open. In my case, I was generating PDFs with hundreds to thousands of pages.

There are different ways to adjust this setting based on the system you are running but here is what worked for me on an Ubuntu server:

Add the following to the end of /etc/security/limits.conf:

# Sets the open file maximum here.
# Generating large PDFs hits the default ceiling (1024) quickly. 
*    hard nofile 65535
*    soft nofile 65535
root hard nofile 65535 # Need these two lines because the wildcards (above)
root soft nofile 65535 # are not applied to the root user as well.

A good reference for the ulimit command can be found here.

I hope that puts some people on the right track.




回答3:


I had this error come up while running multiple threads in my ruby program. I was running phantomjs with Capybara-poltergeist and each thread was visiting a page opening up the same CSV file and writing to it.

I was able to fix it by using the Mutex class.

lock = Mutex.new
lock.synchronize do
    CSV.open("reservations.csv", "w") do |file|
        file << ["Status","Name","Res-Code","LS-Num","Check-in","Check-out","Talk-URL"]
          $status.length.times do |i|
              file << [$status[i],$guest_name[i],$reservation_code[i],$listing_number[i],$check_in[i],$check_out[i], $talk_url[i]]
          end
        end
        puts "#{user.email} PAGE NUMBER ##{p+1} WRITTEN TO CSV"
    end
end


来源:https://stackoverflow.com/questions/15005830/phantomjs-using-too-many-threads

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!