spark streaming write data to Hbase with python blocked on saveAsNewAPIHadoopDataset
问题 I’m using spark-streaming python read kafka and write to hbase, I found the job on stage of saveAsNewAPIHadoopDataset very easily get blocked. As the below picture: You will find the duration is 8 hours on this stage. Does the spark write data by Hbase api or directly write the data via HDFS api please? 回答1: A bit late , but here is a similar example To save an RDD to hbase : Consider an RDD containing a single line : {"id":3,"name":"Moony","color":"grey","description":"Monochrome kitty"}