BeautifulSoup MemoryError When Opening Several Files in Directory

白昼怎懂夜的黑 提交于 2019-12-01 22:11:41

I´m a very beginner programmer and I faced the same problem. I did three things that seemed to solve the problem:

  1. Also call garbage collection('gc.collect()') at the beginning of the iteration
  2. transforme the parsing on a iteration, so all the global variables will become local variables and will be deleted at the end of the function.
  3. Use soupe.decompose()

I think the second change probably solved it, but I didn´t have time to check it and I don´t want to change a working code.

For the this code, the solution would be something like this:

from bs4 import BeautifulSoup
import glob
import gc

def parser(file):
    gc.collect()

    get_data = open(file,'r').read()

    soup = BeautifulSoup(get_data)
    get_data.close()
    VerifyTable = "Clinical Results"

    tables = soup.findAll('table')

    for table in tables:
        First_Row_First_Column = table.findAll('tr')[0].findAll('td')[0].text
        if VerifyTable == First_Row_First_Column.strip():
            v1 = table.findAll('tr')[1].findAll('td')[0].text
            v2 = table.findAll('tr')[1].findAll('td')[1].text

            complete_row = v1.strip() + ";" + v2.strip()

            print (complete_row)

            with open("Results_File.txt","a") as out_file:
                out_string = ""
                out_string += complete_row
                out_string += "\n"
                out_file.write(out_string)
                out_file.close()

    soup.decompose()
    gc.collect()
    return None


for filename in glob.glob("\\Research Results\\*"):
    parser(filename)

print ("done")
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!