How can you pushdown predicates to Cassandra or limit requested data when using Pyspark / Dataframes?
问题 For example on docs.datastax.com we mention : table1 = sqlContext.read.format("org.apache.spark.sql.cassandra").options(table="kv", keyspace="ks").load() and its the only way I know, but lets say that I want to load only the last one million entries from this table. I don't want to load the whole table in memory every time, especially if this table has for example, over 10 million entries. Thanks! 回答1: While you can't load data faster. You can load portions of the data or terminate early.