Failed to allocate memory (No MemoryError) in Ruby?

喜夏-厌秋 提交于 2019-12-10 13:14:27

问题


I wrote a simple script that is supposed to read an entire directory and then parse the HTML data into normal script by getting rid off the HTML tags and then write it into one file.

I have 8GB memory and also plenty of available virtual memory. When I am doing this I have more than 5GB RAM available. The largest file in the directory is 3.8 GB.

The script is

file_count = 1
File.open("allscraped.txt", 'w') do |out1|
    for file_name in Dir["allParts/*.dat"] do
        puts "#{file_name}#:#{file_count}"
        file_count +=1
        File.open(file_name, "r") do |file|
            source = ""
            tmp_src = ""
            counter = 0
            file.each_line do |line|
                scraped_content = line.gsub(/<.*?\/?>/, '')
                tmp_src << scraped_content
                if (counter % 10000) == 0
                    tmp_src = tmp_src.gsub( /\s{2,}/, "\n" )
                    source << tmp_src
                    tmp_src = ""
                    counter = 0
                end
                counter += 1
            end
            source << tmp_src.gsub( /\s{2,}/, "\n" )
            out1.write(source)
            break
        end
    end
end

The full error code is:

realscraper.rb:33:in `block (4 levels) in <main>': failed to allocate memory (No
MemoryError)
        from realscraper.rb:27:in `each_line'
        from realscraper.rb:27:in `block (3 levels) in <main>'
        from realscraper.rb:23:in `open'
        from realscraper.rb:23:in `block (2 levels) in <main>'
        from realscraper.rb:13:in `each'
        from realscraper.rb:13:in `block in <main>'
        from realscraper.rb:12:in `open'
        from realscraper.rb:12:in `<main>'

Where line#27 is file.each_line do |line| and 33 is source << tmp_src. The failing file is the largest one (3.8 GB). What is the problem here? Why am I getting this error even though I have enough memory? Also how can I fix it?


回答1:


The problem is on these two lines:

source << tmp_src
source << tmp_src.gsub( /\s{2,}/, "\n" )

When you read a large file you are slowly growing a very large string in memory.

The simplest solution is not to use this temporary source string at all, but to write the results directly to the file. Just replace those two lines with this instead:

# source << tmp_src
out1.write(tmp_src) 

# source << tmp_src.gsub( /\s{2,}/, "\n" )
out1.write(tmp_src.gsub( /\s{2,}/, "\n" ))                     

This way you're not creating any big temporary strings in memory and it should work better (and faster) this way.



来源:https://stackoverflow.com/questions/23520758/failed-to-allocate-memory-no-memoryerror-in-ruby

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!