I have a data set in which there is a column known as Native Country which contain around 30000 records. Some are missing represented by NaN so I thoug
import numpy as np
import pandas as pd
print(pd.__version__)
1.2.0
df = pd.DataFrame({'Country': [np.nan, 'France', np.nan, 'Spain', 'France'], 'Purchased': [np.nan,'Yes', 'Yes', 'No', np.nan]})
| Country | Purchased | |
|---|---|---|
| 0 | NaN | NaN |
| 1 | France | Yes |
| 2 | NaN | Yes |
| 3 | Spain | No |
| 4 | France | NaN |
df.fillna(df.mode()) ## only applied on first row because df.mode() returns a dataframe with one row
| Country | Purchased | |
|---|---|---|
| 0 | France | Yes |
| 1 | France | Yes |
| 2 | NaN | Yes |
| 3 | Spain | No |
| 4 | France | NaN |
df = pd.DataFrame({'Country': [np.nan, 'France', np.nan, 'Spain', 'France'], 'Purchased': [np.nan,'Yes', 'Yes', 'No', np.nan]})
df.fillna(df.mode().iloc[0]) ## convert df to a series
| Country | Purchased | |
|---|---|---|
| 0 | France | Yes |
| 1 | France | Yes |
| 2 | France | Yes |
| 3 | Spain | No |
| 4 | France | Yes |