Why Spark SQL considers the support of indexes unimportant?
Quoting the Spark DataFrames, Datasets and SQL manual : A handful of Hive optimizations are not yet included in Spark. Some of these (such as indexes) are less important due to Spark SQL’s in-memory computational model. Others are slotted for future releases of Spark SQL. Being new to Spark, I'm a bit baffled by this for two reasons: Spark SQL is designed to process Big Data, and at least in my use case the data size far exceeds the size of available memory. Assuming this is not uncommon, what is meant by "Spark SQL’s in-memory computational model"? Is Spark SQL recommended only for cases