This is a simple scraper written in JavaScript with Node.js, for scraping Wikipedia for periodic table element data. The dependencies are jsdom for DOM manipulation and chain-ga
jsdom does have a memory leak which stems from the copy in and copy out logic behind node's vm.runInContext()
. There has been effort to fix this problem using c++ and we are hoping to prove out the solution before attempting to push it into node.
A workaround for now is to spawn up a child process for each dom and close it down when you are done.
EDIT:
as of jsdom 0.2.3 this issue is fixed as long as you close the window (window.close()
) when you are done with it.