read use spark phoenix from table to rdd partition number is 1

问题

When I ran my spark code:

    val sqlContext = spark.sqlContext
    val noact_table = primaryDataProcessor.getTableData(sqlContext, zookeeper, tableName)
    println("noact_table.rdd:"+noact_table.rdd.partitions.size)
    val tmp = noact_table.rdd
    println(tmp.partitions.size)
    val out = tmp.map(x => x(0) + "," + x(1))
    HdfsOperator.writeHdfsFile(out, "/tmp/test/push")

getTableData:

 def getTableData(sqlContext: SQLContext, zkUrl: String, tableName: String): DataFrame = {
    val tableData = sqlContext.read.format("org.apache.phoenix.spark")
      .option("table", tableName)
      .option("zkUrl", zkUrl).load()
    tableData
  }

My problem is the table has about 2000 rows of data,but my partition result in 1

then i continue with :

val push_res = cookieRdd.keyBy(_._2._2).join(tmp).map(x => (x._2._1._1, x._1, x._2._2._2, x._2._2._3, x._2._2._4, x._2._2._5, nexthour))

My cookierdd partition is 96 and the tmp partition number is 1 then the number of push_res'partition is 1 . could any when explain why this happen? why tmp partition and push_res partition are both 1?

来源：https://stackoverflow.com/questions/52813180/read-use-spark-phoenix-from-table-to-rdd-partition-number-is-1

标签

scala

apache-spark

pyspark

rdd

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!