Right now I have a DF like this
Word Word2 Word3
Hello NaN NaN
My My Name NaN
Yellow Yellow Bee Yel
import numpy as np
import pandas as pd
import functools
def drop_and_roll(col, na_position='last', fillvalue=np.nan):
result = np.full(len(col), fillvalue, dtype=col.dtype)
mask = col.notnull()
N = mask.sum()
if na_position == 'last':
result[:N] = col.loc[mask]
elif na_position == 'first':
result[-N:] = col.loc[mask]
else:
raise ValueError('na_position {!r} unrecognized'.format(na_position))
return result
df = pd.read_table('data', sep='\s{2,}')
print(df.apply(functools.partial(drop_and_roll, fillvalue='')))
yields
Word Word2 Word3
0 Hello My Name Yellow Bee Hive
1 My Yellow Bee
2 Yellow Golden Gates
3 Golden
4 Yellow
I think you can use this:
df = df.apply(lambda x: pd.Series(x.dropna().values))
For example:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Word':['Hello', 'My', 'Yellow', 'Golden', 'Yellow'],
'Word2':[np.nan, 'My Name', 'Yellow Bee', 'Golden Gates', np.nan],
'Word3':[np.nan, np.nan, 'Yellow Bee Hive', np.nan, np.nan]
})
print(df)
Initial dataframe:
Word Word2 Word3
0 Hello NaN NaN
1 My My Name NaN
2 Yellow Yellow Bee Yellow Bee Hive
3 Golden Golden Gates NaN
4 Yellow NaN NaN
and applying this lambda function:
df = df.apply(lambda x: pd.Series(x.dropna().values))
print(df)
gives:
Word Word2 Word3
0 Hello My Name Yellow Bee Hive
1 My Yellow Bee NaN
2 Yellow Golden Gates NaN
3 Golden NaN NaN
4 Yellow NaN NaN
Then you can fill NaN values with empty strings:
df = df.fillna('')
print(df)
Word Word2 Word3
0 Hello My Name Yellow Bee Hive
1 My Yellow Bee
2 Yellow Golden Gates
3 Golden
4 Yellow
Since you want the values to move up, you'll have to create a new data frame
Started with -
Word Word2
0 Hello NaN
1 My My Name
2 Yellow Yellow Bee
3 Golden Golden Gates
4 Yellow NaN
Used following method -
def get_column_array(df, column):
expected_length = len(df)
current_array = df[column].dropna().values
if len(current_array) < expected_length:
current_array = np.append(current_array, [''] * (expected_length - len(current_array)))
return current_array
pd.DataFrame({column: get_column_array(df, column) for column in df.columns}
Gives -
Word Word2
0 Hello My Name
1 My Yellow Bee
2 Yellow Golden Gates
3 Golden
4 Yellow
You can also edit the existing df with the same function -
for column in df.columns:
df[column] = get_column_array(df, column)