Replace NaN values of pandas.DataFrame with values from list

限于喜欢 提交于 2021-02-07 13:21:21

问题


In a python script using the library pandas, I have a dataset of let's say 100 lines with a feature "X", containing 36 NaN values, and a list of size 36.

I want to replace all the 36 missing values of the column "X" by the 36 values I have in my list.

It's likely to be a dumb question, but I went through all the doc and couldn't find a way to do it.

Here's an example :

INPUT

Data:   X      Y
        1      8
        2      3
        NaN    2
        NaN    7
        1      2
        NaN    2

Filler

List: [8, 6, 3]

OUTPUT

Data:   X      Y
        1      8
        2      3
        8      2
        6      7
        1      2
        3      2

回答1:


Start with your dataframe df

print(df)

     X  Y
0  1.0  8
1  2.0  3
2  NaN  2
3  NaN  7
4  1.0  2
5  NaN  2

Define the values you want to fill with (Note: there must be the same number of elements in your filler list as NaN values in your dataframe)

filler = [8, 6, 3]

Filter your column (that contains the NaN values) and overwrite the selected rows with your filler

df.X[df.X.isnull()] = filler

df.loc[df.X.isnull(), 'X'] = filler

which gives:

print(df)

     X  Y
0  1.0  8
1  2.0  3
2  8.0  2
3  6.0  7
4  1.0  2
5  3.0  2



回答2:


You'd have to use an iterator as an index marker for replacing your NaN's with the value in your custom list:

import numpy as np
import pandas as pd

your_df = pd.DataFrame({'your_column': [0,1,2,np.nan,4,6,np.nan,np.nan,7,8,np.nan,9]})  # a df with 4 NaN's
print your_df

your_custom_list = [1,3,6,8]  # custom list with 4 fillers

your_column_vals = your_df['your_column'].values

i_custom = 0  # starting index on your iterator for your custom list
for i in range(len(your_column_vals)):
    if np.isnan(your_column_vals[i]):
        your_column_vals[i] = your_custom_list[i_custom]
        i_custom += 1  # increase the index

your_df['your_column'] = your_column_vals

print your_df

Output:

    your_column
0           0.0
1           1.0
2           2.0
3           NaN
4           4.0
5           6.0
6           NaN
7           NaN
8           7.0
9           8.0
10          NaN
11          9.0
    your_column
0           0.0
1           1.0
2           2.0
3           1.0
4           4.0
5           6.0
6           3.0
7           6.0
8           7.0
9           8.0
10          8.0
11          9.0



回答3:


This may not be the efficient one, but still works :) First find all index for the Nan's and replace them in loop. Assuming that list is always bigger than number of Nan's

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [np.nan, 1, 2], 'B': [10, np.nan, np.nan], 'C': [[20, 21, 22], [23, 24, 25], np.nan]})
lst=[12,35,78]

index = df['B'].index[df['B'].apply(np.isnan)] #find Index
cnt=0
for item in index:
    df.set_value(item, 'B', lst[item]) #replace Nan of the nth index with value from Nth value from list
    cnt=cnt+1

print df

     A     B             C
0  NaN  10.0  [20, 21, 22]
1  1.0   NaN  [23, 24, 25]
2  2.0   NaN           NaN

Output .

     A     B             C
0  NaN  10.0  [20, 21, 22]
1  1.0  35.0  [23, 24, 25]
2  2.0  78.0           NaN


来源:https://stackoverflow.com/questions/42167429/replace-nan-values-of-pandas-dataframe-with-values-from-list

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!