Spark query running very slow

前端 未结 2 2226
野性不改
野性不改 2021-02-20 07:09

i have a cluster on AWS with 2 slaves and 1 master. All instances are of type m1.large. I\'m running spark version 1.4. I\'m benchmarking the performance of spark over 4m data c

相关标签:
2条回答
  • 2021-02-20 07:49

    This is normal, don't except Spark to run in a few milli-secondes like mysql or postgres do. Spark is low latency compared to other big data solutions like Hive, Impala... you cannot compare it with classic database, Spark is not a database where data are indexed!

    Watch this video: https://www.youtube.com/watch?v=8E0cVWKiuhk

    They clearly put Spark here:

    Did you try Apache Drill? I found it a bit faster (I use it for small HDFS JSON files, 2/3Gb, much faster than Spark for SQL queries).

    0 讨论(0)
  • 2021-02-20 08:02
    1. Set default.parallelism to 2
    2. Start spark with --num-executor-cores 8
    3. Modify this part

    df.registerTempTable('test') d=sqlContext.sql("""...

    to

    df.registerTempTable('test') sqlContext.cacheTable("test") d=sqlContext.sql("""...

    0 讨论(0)
提交回复
热议问题