Saving data back into Cassandra as RDD

不打扰是莪最后的温柔 提交于 2019-12-06 07:14:19

You're using the Spark Cassandra Connector by Datastax which doesn't have support for python at the RDD / DStream level. Only Dataframes are supported. See the docs for more information.

I've authored a wrapper around the aforementioned connector: PySpark Cassandra. It is not feature complete with respect to the connector by Datastax, but a lot of stuff is there. Also, if performance is important, investigating the performance hit may be worth while.

Finally, Spark ships with a python example of using the CqlInput/OutputFormat from hadoop mapreduce. Not a very developer friendly option in my opinion, but it's there.

Looking at your code and readint through your issue description: it doesn't appear that there is any Cassandra connector that you're using. Spark doesn't come with Cassandra support out of the box as such the RDD and DStream data types don't have the saveToCassandra method. You need to import an external Spark-Cassandra connector which extends the RDD and DStream types to support Cassandra integration.

This is why you're getting the error: Python can't find any function saveToCassandra on the DStream type because none currently exists.

You'll need to get the DataStax connector or some other connector to extend the DStream type with saveToCassandra.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!