I have two DataFrames in pandas, trying to merge them. But pandas keeps changing the order. I\'ve tried setting indexes, resetting them, no matter what I do, I can\'t get th
The fastest way I've found to merge and restore order - if you are merging "left" - is to include the original order as a column in the left dataframe before merging, then use that to restore the order after merging:
import pandas
loans = [ 'a', 'b', 'c' ]
states = [ 'OR', 'CA', 'OR' ]
x = pandas.DataFrame({ 'loan' : loans, 'state' : states })
y = pandas.DataFrame({ 'state' : [ 'CA', 'OR' ], 'value' : [ 1, 2]})
import numpy as np
x["Order"] = np.arange(len(x))
z = x.merge(y, how='left', on='state').set_index("Order").ix[np.arange(len(x)), :]
This method is faster than sorting. Here it is as a function:
def mergeLeftInOrder(x, y, on=None):
x = x.copy()
x["Order"] = np.arange(len(x))
z = x.merge(y, how='left', on=on).set_index("Order").ix[np.arange(len(x)), :]
return z