How to find median and quantiles using Spark
问题 How can I find median of an RDD of integers using a distributed method, IPython, and Spark? The RDD is approximately 700,000 elements and therefore too large to collect and find the median. This question is similar to this question. However, the answer to the question is using Scala, which I do not know. How can I calculate exact median with Apache Spark? Using the thinking for the Scala answer, I am trying to write a similar answer in Python. I know I first want to sort the RDD . I do not