Ruby/Mechanize “failed to allocate memory”. Erasing instantiation of 'agent.get' method?

走远了吗. 提交于 2019-12-09 20:06:30

问题


I've got a little problem about leaking memory in a Mechanize Ruby script.

I "while loop" multiple web pages access forever and memory increase a lot on each loop. That created a "failed to allocate memory" after minutes and made script exit.

In fact, it seems that the agent.get method instantiate and hold the result even if I assign the result to the same "local variable" or even a "global variable". So I tried to assign nil to the variable after last used and before reusing the same name variable. But it seems that previous agent.get results are still in memory and really don't know how to drain RAM to make my script using a roughly stable amount of memory after hours?

Here are two peace of code : (stay on "enter" key and see the Ruby allocated RAM growing)

#!/usr/bin/env ruby

require 'mechanize'

agent = Mechanize.new
agent.user_agent_alias = 'Windows Mozilla'
GC.enable
#puts GC.malloc_allocations
while gets.chomp!="stop"
    page = agent.get 'http://www.nypost.com/'
    puts "agent.object_id  : "+agent.object_id.to_s
    puts "page.object_id  : "+page.object_id.to_s
    page=nil
    puts "page.object_id  : "+page.object_id.to_s
    page = agent.get 'http://www.nypost.com/'
    puts "page.object_id  : "+page.object_id.to_s
    page=nil
    puts "page.object_id  : "+page.object_id.to_s
    puts local_variables
    GC.start
    puts local_variables
    #puts GC.malloc_allocations
end

And with global variable instead :

#!/usr/bin/env ruby

require 'mechanize'

agent = Mechanize.new
agent.user_agent_alias = 'Windows Mozilla'
while gets.chomp!="stop"
    $page = agent.get 'http://www.nypost.com/'
    puts "agent.object_id  : "+agent.object_id.to_s
    puts "$page.object_id  : "+$page.object_id.to_s
    $page = agent.get 'http://www.nypost.com/'
    puts "$page.object_id  : "+$page.object_id.to_s
    #puts local_variables
    #puts global_variables
end

In other languages the variable is re-affected and allocated memory stay stable. why ruby doesn't? How can I force instances to garbage?

Edit : Here is an other example using Object as Ruby is an Object Oriented language but result is exactly the same : memory grow again and again...

#!/usr/bin/env ruby

require 'mechanize'

$agent = Mechanize.new
$agent.user_agent_alias = 'Windows Mozilla'
class GetContent
    def initialize url
        while true
            @page = $agent.get url
            remove_instance_variable(:@page)
        end
    end
end
myPage = GetContent.new('http://www.nypost.com/')

My Answer (not enough reputation to do it properly)

Ok so !

It seems that Mechanize::History.clear greatly solves this problem of memory leak.

here is the last Ruby code modified if you want to test before and after...

#!/usr/bin/env ruby

require 'mechanize'

$agent = Mechanize.new
$agent.user_agent_alias = 'Windows Mozilla'
class GetContent
    def initialize url
        while true
            @page = $agent.get url
            $agent.history.clear
        end
    end
end
myPage = GetContent.new('http://www.nypost.com/')

回答1:


My suggestion is setting agent.max_history = 0. As mentioned in the list of linked issues.

This will keep a history entry from even being added, instead of using #clear.

Here is the modified version of the other answer

#!/usr/bin/env ruby

require 'mechanize'

$agent = Mechanize.new
$agent.user_agent_alias = 'Windows Mozilla'
$agent.max_history = 0
class GetContent
    def initialize url
        while true
            @page = $agent.get url
        end
    end
end
myPage = GetContent.new('http://www.nypost.com/')



回答2:


Ok so ! (had enough reputation to answer my owns questions properly)

It seems that Mechanize::History.clear greatly solves this problem of memory leak.

here is the last Ruby code modified if you want to test before and after...

#!/usr/bin/env ruby

require 'mechanize'

$agent = Mechanize.new
$agent.user_agent_alias = 'Windows Mozilla'
class GetContent
    def initialize url
        while true
            @page = $agent.get url
            $agent.history.clear
        end
    end
end
myPage = GetContent.new('http://www.nypost.com/')


来源:https://stackoverflow.com/questions/7191752/ruby-mechanize-failed-to-allocate-memory-erasing-instantiation-of-agent-get

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!