How to download via HTTP only piece of big file with ruby

后端 未结 3 1830
误落风尘
误落风尘 2020-12-06 07:45

I only need to download the first few kilobytes of a file via HTTP.

I tried

require \'open-uri\'
url = \'http://example.com/big-file.dat\'
file = ope         


        
相关标签:
3条回答
  • 2020-12-06 08:22

    Check out "OpenURI returns two different objects". You might be able to abuse the methods in there to interrupt downloading/throw away the rest of the result after a preset limit.

    0 讨论(0)
  • 2020-12-06 08:29

    This is an old thread, but it's still a question that seems mostly unanswered according to my research. Here's a solution I came up with by monkey-patching Net::HTTP a bit:

    require 'net/http'
    
    # provide access to the actual socket
    class Net::HTTPResponse
      attr_reader :socket
    end
    
    uri = URI("http://www.example.com/path/to/file")
    begin
      Net::HTTP.start(uri.host, uri.port) do |http|
        request = Net::HTTP::Get.new(uri.request_uri)
        # calling request with a block prevents body from being read
        http.request(request) do |response|
          # do whatever limited reading you want to do with the socket
          x = response.socket.read(100);
        end
      end
    rescue IOError
      # ignore
    end
    

    The rescue catches the IOError that's thrown when you call HTTP.finish prematurely.

    FYI, the socket within the HTTPResponse object isn't a true IO object (it's an internal class called BufferedIO), but it's pretty easy to monkey-patch that, too, to mimic the IO methods you need. For example, another library I was using (exifr) needed the readchar method, which was easy to add:

    class Net::BufferedIO
      def readchar
        read(1)[0].ord
      end
    end
    
    0 讨论(0)
  • 2020-12-06 08:44

    This seems to work when using sockets:

    require 'socket'                  
    host = "download.thinkbroadband.com"                 
    path = "/1GB.zip" # get 1gb sample file
    request = "GET #{path} HTTP/1.0\r\n\r\n"
    socket = TCPSocket.open(host,80) 
    socket.print(request)        
    
    # find beginning of response body
    buffer = ""                    
    while !buffer.match("\r\n\r\n") do
      buffer += socket.read(1)  
    end           
    
    response = socket.read(100) #read first 100 bytes of body
    puts response
    

    I'm curious if there is a "ruby way".

    0 讨论(0)
提交回复
热议问题