问题
I am trying to ping a large amount of urls and retrieve information regarding the certificate of the url. As I read in this thoughtbot article here Thoughtbot Threads and others, I've read that the best way to do this is by using Threads. When I implement threads however, I keep running into Timeout errors and other problems for urls that I can retrieve successfully on their own. I've been told in another related question that I asked earlier that I should not use Timeout with Threads. However, the examples I see wrap API/NET::HTTP/TCPSocket calls in the Timeout block and based opn what I've read, that entire API/NET::HTTP/TCP Socket call will be nested within the Thread. Here is my code:
class SslClient
attr_reader :url, :port, :timeout
def initialize(url, port = '443', timeout = 30)
@url = url
@port = port
@timeout = timeout
end
def ping_for_certificate_info
context = OpenSSL::SSL::SSLContext.new
certificates = nil
verify_result = nil
Timeout.timeout(timeout) do
tcp_client = TCPSocket.new(url, port)
ssl_client = OpenSSL::SSL::SSLSocket.new tcp_client, context
ssl_client.hostname = url
ssl_client.sync_close = true
ssl_client.connect
certificates = ssl_client.peer_cert_chain
verify_result = ssl_client.verify_result
tcp_client.close
end
{certificate: certificates.first, verify_result: verify_result }
rescue => error
puts url
puts error.inspect
end
end
[VERY LARGE LIST OF URLS].map do |url|
Thread.new do
ssl_client = SslClient.new(url)
cert_info = ssl_client.ping_for_certificate_info
puts cert_info
end
end.map(&:value)
If you run this code in your terminal, you will see many Timeout errors and ERNNO:TIMEDOUT errors for sites like fandango.com, fandom.com, mcaffee.com, google.de etc that should return information. When I run these individually however I get the information I need. When I run them in the thread they tend to fail especially for domains that have a foreign domain name. What I'm asking is whether I am using Threads correctly. This snippet of code that I've pasted is part of a larger piece of code that interacts with ActiveRecord objects in rails depending on the results given. Am I using Timeout and Threads correctly? What do I need to do to make this work? Why would a ping work individually but not wrapped in a thread? Help would be greatly appreciated.
回答1:
There are several issues:
- You'd not spawn thousands of threads, use a connection pool (e.g https://github.com/mperham/connection_pool) so you have maximum 20-30 concurrent requests going (this maximum number should be determined by testing at which point network performance drops and you get these timeouts).
- It's difficult to guarantee that your code is not broken when you use threads, that's why I suggest you use something where others figured it out for you, like https://github.com/httprb/http (with examples for thread safety and concurrent requests like https://github.com/httprb/http/wiki/Thread-Safety). There are other libs out there (Typhoeus, patron) but this one is pure Ruby so basic thread safety is easier to achieve.
- You should not use
Timeout
(see https://jvns.ca/blog/2015/11/27/why-rubys-timeout-is-dangerous-and-thread-dot-raise-is-terrifying and https://medium.com/@adamhooper/in-ruby-dont-use-timeout-77d9d4e5a001). UseIO.select
or something else.
Also, I suggest you learn about threading issues like deadlocks, starvations and all the gotchas. In your case you are doing a starvation of network resources because all the threads are fighting for bandwidth/network.
来源:https://stackoverflow.com/questions/60160027/how-do-i-properly-use-threads-to-connect-ping-a-url