How to treat NULL as a normal string with pandas?

后端 未结 4 1487
情书的邮戳
情书的邮戳 2020-12-09 15:34

I have a csv-file with a column with strings and I want to read it with pandas. In this file the string null occurs as an actual value and should not be regarde

相关标签:
4条回答
  • 2020-12-09 16:00

    The reason this happens is that the string 'null' is treated as NaN on parsing, you can turn this off by passing keep_default_na=False in addition to @coldspeed's answer:

    In[49]:
    data = u'strings,numbers\nfoo,1\nbar,2\nnull,3'
    df = pd.read_csv(io.StringIO(data), keep_default_na=False)
    df
    
    Out[49]: 
      strings  numbers
    0     foo        1
    1     bar        2
    2    null        3
    

    The full list is:

    na_values : scalar, str, list-like, or dict, default None

    Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’.

    0 讨论(0)
  • 2020-12-09 16:03

    UPDATE: 2020-03-23 for Pandas 1+:

    many thanks to @aiguofer for the adapted solution:

    na_vals = pd.io.parsers.STR_NA_VALUES.difference({'NULL','null'})
    df = pd.read_csv(io.StringIO(data), na_values=na_vals, keep_default_na=False)
    

    Old answer:

    we can dynamically exclude 'NULL' and 'null' from the set of default _NA_VALUES:

    In [4]: na_vals = pd.io.common._NA_VALUES.difference({'NULL','null'})
    
    In [5]: na_vals
    Out[5]:
    {'',
     '#N/A',
     '#N/A N/A',
     '#NA',
     '-1.#IND',
     '-1.#QNAN',
     '-NaN',
     '-nan',
     '1.#IND',
     '1.#QNAN',
     'N/A',
     'NA',
     'NaN',
     'n/a',
     'nan'}
    

    and use it in read_csv():

    df = pd.read_csv(io.StringIO(data), na_values=na_vals)
    
    0 讨论(0)
  • 2020-12-09 16:06

    Other answers are better for reading in a csv without "null" being interpreted as Nan, but if you have a dataframe that you want "fixed", this code will do so: df=df.fillna('null')

    0 讨论(0)
  • 2020-12-09 16:13

    You can specify a converters argument for the string column.

    pd.read_csv(StringIO(data), converters={'strings' : str})
    
      strings  numbers
    0     foo        1
    1     bar        2
    2    null        3
    

    This will by-pass pandas' automatic parsing.


    Another option is setting na_filter=False:

    pd.read_csv(StringIO(data), na_filter=False)
    
      strings  numbers
    0     foo        1
    1     bar        2
    2    null        3
    

    This works for the entire DataFrame, so use with caution. I recommend first option if you want to surgically apply this to select columns instead.

    0 讨论(0)
提交回复
热议问题