How to cache Dataframe in Apache ignite

大兔子大兔子 提交于 2019-12-07 23:14:09

问题


I am writing a code to cache RDBMS data using spark SQLContext JDBC connection. Once a Dataframe is created I want to cache that reusltset using apache ignite thereby making other applications to make use of the resultset. Here is the code snippet.

object test
{

  def main(args:Array[String])
  {

      val configuration = new Configuration()
      val config="src/main/scala/config.xml"

      val sparkConf = new SparkConf().setAppName("test").setMaster("local[*]")
      val sc=new SparkContext(sparkConf)
      val sqlContext = new org.apache.spark.sql.SQLContext(sc)
      val sql_dump1=sqlContext.read.format("jdbc").option("url", "jdbc URL").option("driver", "com.mysql.jdbc.Driver").option("dbtable", mysql_table_statement).option("user", "username").option("password", "pass").load()

      val ic = new IgniteContext[Integer, Integer](sc, config)

      val sharedrdd = ic.fromCache("hbase_metadata")

      //How to cache sql_dump1 dataframe

  }
}

Now the question is how to cache a dataframe, IgniteRDD has savepairs method but it accepts key and value as RDD[Integer], but I have a dataframe even if I convert that to RDD i would only be getting RDD[Row]. The savepairs method consisting of RDD of Integer seems to be more specific what if I have a string of RDD as value? Is it good to cache dataframe or any other better approach to cache the resultset.


回答1:


There is no reason to store DataFrame in an Ignite cache (shared RDD) since you won't benefit from it too much: at least you won't be able to execute Ignite SQL over the DataFrame.

I would suggest doing the following:

  • provide CacheStore implementation for hbase_metadata cache that will preload all the data from your underlying database. Then you can preload all the data into the cache using Ignite.loadCache method. Here you may find an example on how to use JDBC persistent stores along with Ignite cache (shared RDD)

    • use Ignite Shared RDD sql api to query over cached data.

Alternatively you can get sql_dump1 as you're doing, iterate over each row and store each row individually in the shared RDD using IgniteRDD.savePairs method. After this is done you can query over data using the same Ignite Shared RDD SQL mentioned above.



来源:https://stackoverflow.com/questions/37180715/how-to-cache-dataframe-in-apache-ignite

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!