How do I properly use Threads to connect ping a url?

混江龙づ霸主 提交于 2020-03-23 07:49:38

问题


I am trying to ping a large amount of urls and retrieve information regarding the certificate of the url. As I read in this thoughtbot article here Thoughtbot Threads and others, I've read that the best way to do this is by using Threads. When I implement threads however, I keep running into Timeout errors and other problems for urls that I can retrieve successfully on their own. I've been told in another related question that I asked earlier that I should not use Timeout with Threads. However, the examples I see wrap API/NET::HTTP/TCPSocket calls in the Timeout block and based opn what I've read, that entire API/NET::HTTP/TCP Socket call will be nested within the Thread. Here is my code:

class SslClient
  attr_reader :url, :port, :timeout

  def initialize(url, port = '443', timeout = 30)
    @url = url
    @port = port
    @timeout = timeout
  end

  def ping_for_certificate_info
    context = OpenSSL::SSL::SSLContext.new
    certificates = nil
    verify_result = nil
    Timeout.timeout(timeout) do
      tcp_client = TCPSocket.new(url, port)
      ssl_client = OpenSSL::SSL::SSLSocket.new tcp_client, context
      ssl_client.hostname = url
      ssl_client.sync_close = true
      ssl_client.connect
      certificates = ssl_client.peer_cert_chain
      verify_result = ssl_client.verify_result
      tcp_client.close
    end
    {certificate: certificates.first, verify_result: verify_result }
  rescue => error
    puts url
    puts error.inspect
  end
end

  [VERY LARGE LIST OF URLS].map do |url|
      Thread.new do
        ssl_client = SslClient.new(url)
        cert_info = ssl_client.ping_for_certificate_info
        puts cert_info
      end
    end.map(&:value)

If you run this code in your terminal, you will see many Timeout errors and ERNNO:TIMEDOUT errors for sites like fandango.com, fandom.com, mcaffee.com, google.de etc that should return information. When I run these individually however I get the information I need. When I run them in the thread they tend to fail especially for domains that have a foreign domain name. What I'm asking is whether I am using Threads correctly. This snippet of code that I've pasted is part of a larger piece of code that interacts with ActiveRecord objects in rails depending on the results given. Am I using Timeout and Threads correctly? What do I need to do to make this work? Why would a ping work individually but not wrapped in a thread? Help would be greatly appreciated.


回答1:


There are several issues:

  • You'd not spawn thousands of threads, use a connection pool (e.g https://github.com/mperham/connection_pool) so you have maximum 20-30 concurrent requests going (this maximum number should be determined by testing at which point network performance drops and you get these timeouts).
  • It's difficult to guarantee that your code is not broken when you use threads, that's why I suggest you use something where others figured it out for you, like https://github.com/httprb/http (with examples for thread safety and concurrent requests like https://github.com/httprb/http/wiki/Thread-Safety). There are other libs out there (Typhoeus, patron) but this one is pure Ruby so basic thread safety is easier to achieve.
  • You should not use Timeout (see https://jvns.ca/blog/2015/11/27/why-rubys-timeout-is-dangerous-and-thread-dot-raise-is-terrifying and https://medium.com/@adamhooper/in-ruby-dont-use-timeout-77d9d4e5a001). Use IO.select or something else.

Also, I suggest you learn about threading issues like deadlocks, starvations and all the gotchas. In your case you are doing a starvation of network resources because all the threads are fighting for bandwidth/network.



来源:https://stackoverflow.com/questions/60160027/how-do-i-properly-use-threads-to-connect-ping-a-url

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!