Remove NaN 'Cells' without dropping the entire ROW (Pandas,Python3)

后端 未结 3 746
闹比i
闹比i 2020-12-06 18:40

Right now I have a DF like this

 Word       Word2          Word3
 Hello      NaN            NaN
 My         My Name        NaN
 Yellow     Yellow Bee     Yel         


        
相关标签:
3条回答
  • 2020-12-06 19:06
    import numpy as np
    import pandas as pd
    import functools
    
    def drop_and_roll(col, na_position='last', fillvalue=np.nan):
        result = np.full(len(col), fillvalue, dtype=col.dtype)
        mask = col.notnull()
        N = mask.sum()
        if na_position == 'last':
            result[:N] = col.loc[mask]
        elif na_position == 'first':
            result[-N:] = col.loc[mask]
        else:
            raise ValueError('na_position {!r} unrecognized'.format(na_position))
        return result
    
    df = pd.read_table('data', sep='\s{2,}')
    
    print(df.apply(functools.partial(drop_and_roll, fillvalue='')))
    

    yields

         Word         Word2            Word3
    0   Hello       My Name  Yellow Bee Hive
    1      My    Yellow Bee                 
    2  Yellow  Golden Gates                 
    3  Golden                               
    4  Yellow     
    
    0 讨论(0)
  • 2020-12-06 19:06

    I think you can use this:

    df = df.apply(lambda x: pd.Series(x.dropna().values))
    

    For example:

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({
        'Word':['Hello', 'My', 'Yellow', 'Golden', 'Yellow'],
        'Word2':[np.nan, 'My Name', 'Yellow Bee', 'Golden Gates', np.nan],
        'Word3':[np.nan, np.nan, 'Yellow Bee Hive', np.nan, np.nan]
    })
    
    print(df)
    

    Initial dataframe:

         Word         Word2            Word3
    0   Hello           NaN              NaN
    1      My       My Name              NaN
    2  Yellow    Yellow Bee  Yellow Bee Hive
    3  Golden  Golden Gates              NaN
    4  Yellow           NaN              NaN
    

    and applying this lambda function:

    df = df.apply(lambda x: pd.Series(x.dropna().values))
    
    print(df)
    

    gives:

         Word         Word2            Word3
    0   Hello       My Name  Yellow Bee Hive
    1      My    Yellow Bee              NaN
    2  Yellow  Golden Gates              NaN
    3  Golden           NaN              NaN
    4  Yellow           NaN              NaN
    

    Then you can fill NaN values with empty strings:

    df = df.fillna('')
    
    print(df)
    
         Word         Word2            Word3
    0   Hello       My Name  Yellow Bee Hive
    1      My    Yellow Bee                 
    2  Yellow  Golden Gates                 
    3  Golden                               
    4  Yellow    
    
    0 讨论(0)
  • 2020-12-06 19:11

    Since you want the values to move up, you'll have to create a new data frame

    Started with -

         Word         Word2
    0   Hello           NaN
    1      My       My Name
    2  Yellow    Yellow Bee
    3  Golden  Golden Gates
    4  Yellow           NaN
    

    Used following method -

    def get_column_array(df, column):
        expected_length = len(df)
        current_array = df[column].dropna().values
        if len(current_array) < expected_length:
            current_array = np.append(current_array, [''] * (expected_length - len(current_array)))
        return current_array
    
    pd.DataFrame({column: get_column_array(df, column) for column in df.columns}
    

    Gives -

         Word         Word2
    0   Hello       My Name
    1      My    Yellow Bee
    2  Yellow  Golden Gates
    3  Golden              
    4  Yellow              
    

    You can also edit the existing df with the same function -

    for column in df.columns:
        df[column] = get_column_array(df, column)
    
    0 讨论(0)
提交回复
热议问题