Issue with rpy2 handling NA/missing value in dataframe from R to Python

浪尽此生 提交于 2019-12-06 04:41:08

Updates: http://rpy.sourceforge.net/rpy2/doc-2.2/html/rinterface.html

Above link may have useful help on some settings. If you find "NA " (include the space" and go to the second hit. There is one that looks like it relates to your NA problem.

Original post: assuming "def" as shown in your output is coming in as a string, you could replace it with a string that you are confident is not a value in your data and then use this in lieu of the NA value that is not coming in:

This sample code illustrates the concept.

x = "def"
type(x)
x = x.replace("def", "NA")
x

Looking at the problem that your source has two rows that both say 'def' one where it came from the data and another where NA converted to def:

  1. Convert 'def' to something else in R
  2. bring in your data
  3. now 'def' means NA
  4. use it as such or convert it to something you can live with

Is this a problem you encounter often?

  1. if so, create a test function to check your data for 'def'

  2. if found replace with something crazy you know the data will not have like: my_crazy_replacementValue

  3. replace "def" with your desired stand-in for NA

  4. replace my_crazy_replacementValue with "def"

In Python, the most common value for NA, I think is None. Unfortuantely, you cannot replace a value with None using:

string.replace()

It seems reasonable that there should be a better answer: a "Pythonic" way of converting a specified value in a data frame to None. I have to review Pandas -> data frames when I get a chance and then I may log back in and edit this paragraph (or maybe someone else will beat me to it). Hoping the above might help you in the interim.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!