问题
When I ran my spark code:
val sqlContext = spark.sqlContext
val noact_table = primaryDataProcessor.getTableData(sqlContext, zookeeper, tableName)
println("noact_table.rdd:"+noact_table.rdd.partitions.size)
val tmp = noact_table.rdd
println(tmp.partitions.size)
val out = tmp.map(x => x(0) + "," + x(1))
HdfsOperator.writeHdfsFile(out, "/tmp/test/push")
getTableData:
def getTableData(sqlContext: SQLContext, zkUrl: String, tableName: String): DataFrame = {
val tableData = sqlContext.read.format("org.apache.phoenix.spark")
.option("table", tableName)
.option("zkUrl", zkUrl).load()
tableData
}
My problem is the table has about 2000 rows of data,but my partition result in 1
then i continue with :
val push_res = cookieRdd.keyBy(_._2._2).join(tmp).map(x => (x._2._1._1, x._1, x._2._2._2, x._2._2._3, x._2._2._4, x._2._2._5, nexthour))
My cookierdd
partition is 96 and the tmp
partition number is 1
then the number of push_res
'partition is 1 .
could any when explain why this happen? why tmp
partition and push_res
partition are both 1?
来源:https://stackoverflow.com/questions/52813180/read-use-spark-phoenix-from-table-to-rdd-partition-number-is-1