I have given the sample table. I want to get the median from \"value\" column for each group \"source\" column. Where source column is of String DataType value column is of
For non integral values you should use percentile_approx UDF:
import org.apache.spark.mllib.random.RandomRDDs
val df = RandomRDDs.normalRDD(sc, 1000, 10, 1).map(Tuple1(_)).toDF("x")
df.registerTempTable("df")
sqlContext.sql("SELECT percentile_approx(x, 0.5) FROM df").show
// +--------------------+
// | _c0|
// +--------------------+
// |0.035379710486199915|
// +--------------------+
On a side not you should use GROUP BY not PARTITION BY. Latter one is used for window functions and has different effect than you expect.
SELECT source, percentile_approx(value, 0.5) FROM df GROUP BY source
See also How to find median using Spark