Create temporary dataframe with rpy2: memory issue

烂漫一生 提交于 2019-12-24 11:29:28

问题


This question is similar to but simpler than my previous one. Here is the code that I use to create R dataframes from python using rpy2:

import numpy as np
from rpy2 import robjects

Z = np.zeros((10000, 500))
df = robjects.r["data.frame"]([robjects.FloatVector(column) for column in Z.T])

My problem is that using it repetitively results in huge memory consumption. I tried to adapt the idea from here but without success. How can I convert many numpy arrays to dataframe for treatment by R methods without gradually using all my memory?


回答1:


You should make sure that you're using the latest version of rpy2. With rpy2 version 2.4.2, the following works nicely:

import gc

import numpy as np
from rpy2 import robjects
from rpy2.robjects.numpy2ri import numpy2ri


for i in range(100):
    print i
    Z = np.random.random(size=(10000, 500))
    matrix = numpy2ri(Z)
    df = robjects.r("data.frame")(matrix)

    gc.collect()

Memory usage never exceeds 600 MB on my computer.



来源:https://stackoverflow.com/questions/25569941/create-temporary-dataframe-with-rpy2-memory-issue

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!