Spark HBase/BigTable - Wide/sparse dataframe persistence

问题

I want to persist to BigTable a very wide Spark Dataframe (>100'000 columns) that is sparsely populated (>99% of values are null) while keeping only non-null values (to avoid storage cost).

Is there a way to specify in Spark to ignore nulls when writing?

Thanks !

来源：https://stackoverflow.com/questions/65647574/spark-hbase-bigtable-wide-sparse-dataframe-persistence

标签

apache-spark

hbase

sparse-matrix

google-cloud-dataproc

google-cloud-bigtable

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!