df_pairs:
city1 city2
0 sfo yyz
1 sfo yvr
2 sfo dfw
3 sfo ewr
output of df_pairs.to_dict(\'records\'):
Why don't you simply do this:
df_city1 = pd.merge(df_pairs['city1'], data_df, left_on='city1', right_on='city', how='left')
df_city2 = pd.merge(df_pairs['city2'], data_df, left_on='city2', right_on='city', how='left')
diff = df_city2.subtract(df_city1, fill_value=0)
pos_sum = diff[diff >= 0].sum(axis=1)
neg_sum = diff[diff < 0].sum(axis=1)
Instead of looping over all your columns, merging 2*(number of columns) times, not to mention indexing, then that complicated bit with np.sign
and .clip
... Your df_pairs
and data_df
have a one-to-one correspondence, right?