Serious Memory Leak When Iteratively Parsing XML Files

三世轮回 提交于 2019-11-27 22:13:19

From the XML package's webpage, it seems that the author, Duncan Temple Lang, has quite extensively described certain memory management issues. See this page: "Memory Management in the XML Package".

Honestly, I'm not proficient in the details of what's going on here with your code and the package, but I think you'll either find the answer in that page, specifically in the section called "Problems", or in direct communication with Duncan Temple Lang.


Update 1. An idea that might work is to use the multicore and foreach packages (i.e. listResults = foreach(ix = 1:N) %dopar% {your processing;return(listElement)}. I think that for Windows you'll need doSMP, or maybe doRedis; under Linux, I use doMC. In any case, by parallelizing the loading, you'll get faster throughput. The reason I think you may get some benefit from memory usage is that it could be that forking R, could lead to different memory cleaning, as each spawned process gets killed when complete. This isn't guaranteed to work, but it could address both memory and speed issues.

Note, though: doSMP has its own idiosyncracies (i.e. you may still have some memory issues with it). There have been other Q&As on SO that mentioned some issues, but I'd still give it a shot.

@Rappster My R doesn't crash when I first check and make sure the XML doc exists and then call the C function for realizing the memory.

 for (i in 1:1000) {

  pXML<-xmlParse(file)

if(exists("pXML")){
  .Call("RS_XML_forceFreeDoc", pXML)
                  }
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!