How do I calculate the Average salary per location in Spark Scala with below two data sets ?
File1.csv(Column 4 is salary)
Ram, 30, Engineer, 40000  
B         
        
I would use DataFrame API, this should work:
val salary = sc.textFile("File1.csv")
               .map(e => e.split(","))
               .map{case Seq(name,_,_,salary) => (name,salary)}
               .toDF("name","salary")
val location = sc.textFile("File2.csv")
                 .map(e => e.split(","))
                 .map{case Seq(name,location) => (name,location)}
                 .toDF("name","location")
import org.apache.spark.sql.functions._
salary
  .join(location,Seq("name"))
  .groupBy($"location")
  .agg(
    avg($"salary").as("avg_salary")
  )
  .repartition(1)
  .write.csv("output.csv")