Spark JoinWithCassandraTable on TimeStamp partition key STUCK

给你一囗甜甜゛ 提交于 2019-11-29 04:34:16

By using:

sc.parallelize(startDate to endDate)

With the startData and endDate as Longs generated from Dates by the format:

("yyyy-MM-dd HH:mm:ss")

I made spark to build a huge array (100,000+ objects) to join with C* table and it didn't stuck at all- C* worked hard to make the join happen and return the data.

Finally, I changed my range to:

case class TableKey(created_dh: String)
val data = Array("2015-10-29 12:00:00", "2015-10-29 13:00:00", "2015-10-29 14:00:00", "2015-10-29 15:00:00")
val snapshotsFiltered = sc.parallelize(data, 2).map(TableKey(_)).joinWithCassandraTable("listener","snapshots_tnew")

And it is ok now.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!