apache-spark-2.2

Spark serializes variable value as null instead of its real value

匆匆过客 提交于 2020-06-13 05:37:44
问题 My understanding of the mechanics of Spark's code distribution toward the nodes running it is merely cursory, and I fail in having my code successfully run within Spark's mapPartitions API when I wish to instantiate a class for each partition, with an argument. The code below worked perfectly, up until I evolved the class MyWorkerClass to require an argument: val result : DataFrame = inputDF.as[Foo].mapPartitions(sparkIterator => { // (1) initialize heavy class instance once per partition val

Spark serializes variable value as null instead of its real value

北慕城南 提交于 2020-06-13 05:37:38
问题 My understanding of the mechanics of Spark's code distribution toward the nodes running it is merely cursory, and I fail in having my code successfully run within Spark's mapPartitions API when I wish to instantiate a class for each partition, with an argument. The code below worked perfectly, up until I evolved the class MyWorkerClass to require an argument: val result : DataFrame = inputDF.as[Foo].mapPartitions(sparkIterator => { // (1) initialize heavy class instance once per partition val

Spark 2.x - How to generate simple Explain/Execution Plan

自闭症网瘾萝莉.ら 提交于 2020-04-11 12:38:44
问题 I am hoping to generate an explain/execution plan in Spark 2.2 with some actions on a dataframe. The goal here is to ensure that partition pruning is occurring as expected before I kick off the job and consume cluster resources. I tried a Spark documentation search and a SO search here but couldn't find a syntax that worked for my situation. Here is a simple example that works as expected: scala> List(1, 2, 3, 4).toDF.explain == Physical Plan == LocalTableScan [value#42] Here's an example

Why does elasticsearch-spark 5.5.0 fail with AbstractMethodError when submitting to YARN cluster?

纵饮孤独 提交于 2020-01-02 21:50:24
问题 I wrote a spark job which main goal is to write into es, and submit it , the issue is when I submit it onto spark clusters, spark gave back [ERROR][org.apache.spark.deploy.yarn.ApplicationMaster] User class threw exception: java.lang.AbstractMethodError: org.elasticsearch.spark.sql.DefaultSource.createRelation(Lorg/apache/spark/sql/SQLContext;Lorg/apache/spark/sql/SaveMode;Lscala/collection/immutable/Map;Lorg/apache/spark/sql/Dataset;)Lorg/apache/spark/sql/sources/BaseRelation; java.lang

Scala String Interpolation with Underscore

夙愿已清 提交于 2019-12-11 04:25:43
问题 I am new to Scala so feel free to point me in the direction of documentation but I was not able to find an answer to this question in my research. I am using scala 2.11.8 with Spark2.2 and trying to create a dynamic string containing dateString1_dateString2 (with underscores) using interpolation but having some issues. val startDt = "20180405" val endDt = "20180505" This seems to work: s"$startDt$endDt" res62: String = 2018040520180505 But this fails: s"$startDt_$endDt" <console>:27: error: