Rpy2: pandas dataframe can't fit in R

杀马特。学长 韩版系。学妹 提交于 2021-02-18 19:06:14

问题


I need to read a csv file with python (into a pandas dataframe), work in R and return to python. Then, to pass pandas dataframe to R dataframe I use rpy2, and work ok (code bellow).

from pandas import read_csv, DataFrame
import pandas.rpy.common as com
import rpy2.robjects as robjects

r = robjects.r
r.library("fitdistrplus")

df = read_csv('./datos.csv')
r_df = com.convert_to_r_dataframe(df)
print(type(r_df))

And this output is:

<class 'rpy2.robjects.vectors.FloatVector'>

But then, I try to make a fit in R:

fit2 = r.fitdist(r_df, "weibull")

But I have this error:

RRuntimeError: Error in (function (data, distr, method = c("mle", "mme", "qme", "mge"),  : 
data must be a numeric vector of length greater than 1

I have 2nd question in this:
1_ What I do wrong?
2_ This is the most efficient way to pass a python dataframe to R? Because, I see this import: from rpy2.robjects.packages import importr

This is the data that I read: https://mega.co.nz/#!P8MEDSzQ!iQyxt73a5pRvJNOxWeSEaFlsVS7_A1sZCAXkUFBLJa0

I use Ipython 2.1 Thanks!


回答1:


You have two issues:

First, you are trying to use a data frame where you really need a vector. (If you tried using an R data.frame for fitdist(), you'd also get an error.)

Second, the pandas<->rpy2 support provided by pandas is buggy, resulting in conversion of your (presumably) numeric pandas data frame to a string/character R data frame:

In [27]: r.sapply(r_df, r["class"])
Out[27]: 
<StrVector - Python:0x1097757a0 / R:0x7fa41c6b0b68>
[str, str, str, str]

This is not good! The following code fixes these errors:

from pandas import read_csv
import rpy2.robjects as robjects

r = robjects.r
r.library("fitdistrplus")

# this will read in your csv file as a Series, rather than a DataFrame
series = read_csv('datos.csv', index_col=0, squeeze=True)

# do the conversion directly, so that we get an R Vector, rather than a 
# data frame, and we know that it's a numeric type
r_vec = robjects.FloatVector(series)

fit2 = r.fitdist(r_vec, "weibull")



回答2:


I haven't try your data, but something like this should work.

%load_ext rmagic

from pandas import read_csv
from rpy2.robjects.packages import importr

# That import alone is sufficient to switch an automatic
# conversion of numpy objects into rpy2 objects.
import rpy2.robjects.numpy2ri
rpy2.robjects.numpy2ri.activate()

f = importr('fitdistrplus')
dfp = read_csv('./test.csv')
f1 = f.fitdist(dfp.as_matrix(), "weibull")
print f1


来源:https://stackoverflow.com/questions/25800556/rpy2-pandas-dataframe-cant-fit-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!