I would like append a new column on dataframe \"df\" from function get_distance:
def get_distance(x, y):
You cannot use Python function on a Column objects directly, unless it is intended to operate on Column objects / expressions. You need udf for that:
@udf
def get_distance(x, y):
...
But you cannot use SQLContext in udf (or mapper in general).
Just join:
tab = hiveContext.table("tab").groupBy("column1", "column2").agg(first("column3"))
df.join(tab, ["column1", "column2"])