问题
I've encounter a problem when using rpy2 package to transform dataframe
saved in R to Python.
import os
os.environ['R_HOME'] = '/Library/Frameworks/R.framework/Resources'
import rpy2.robjects as ro
from rpy2.robjects import pandas2ri
# define a trivial dataframe in R
ro.r('n = c(1,2)')
ro.r("b = c(NA,'def')")
ro.r("temp_df = data.frame(n,b)")
# the dataframe in R shows missing value in one cell as NA
temp_rdf = ro.r('temp_df')
print(temp_rdf)
n b
1 1 <NA>
2 2 def
# yet the transformed Python dataframe replace the missing value with a string
temp_pydf = pandas2ri.ri2py(temp_rdf)
print(temp_pydf)
n b
1 1.0 def
2 2.0 def
I did some search and found this post Rpy2 pandas2ri.ri2py() is converting NA values to integers. It explains why but doesn't provide a solution to this. I want to have Null values in Python for those NA in R dataframe. How could I do this?
回答1:
Updates: http://rpy.sourceforge.net/rpy2/doc-2.2/html/rinterface.html
Above link may have useful help on some settings. If you find "NA " (include the space" and go to the second hit. There is one that looks like it relates to your NA problem.
Original post: assuming "def" as shown in your output is coming in as a string, you could replace it with a string that you are confident is not a value in your data and then use this in lieu of the NA value that is not coming in:
This sample code illustrates the concept.
x = "def"
type(x)
x = x.replace("def", "NA")
x
Looking at the problem that your source has two rows that both say 'def' one where it came from the data and another where NA converted to def:
- Convert 'def' to something else in R
- bring in your data
- now 'def' means NA
- use it as such or convert it to something you can live with
Is this a problem you encounter often?
if so, create a test function to check your data for 'def'
if found replace with something crazy you know the data will not have like: my_crazy_replacementValue
replace "def" with your desired stand-in for NA
replace my_crazy_replacementValue with "def"
In Python, the most common value for NA, I think is None. Unfortuantely, you cannot replace a value with None using:
string.replace()
It seems reasonable that there should be a better answer: a "Pythonic" way of converting a specified value in a data frame to None. I have to review Pandas -> data frames when I get a chance and then I may log back in and edit this paragraph (or maybe someone else will beat me to it). Hoping the above might help you in the interim.
来源:https://stackoverflow.com/questions/42231400/issue-with-rpy2-handling-na-missing-value-in-dataframe-from-r-to-python