scala

Scala Future/Promise fast-fail pipeline

风流意气都作罢 提交于 2021-02-07 13:58:16
问题 I want to launch two or more Future/Promises in parallel and fail even if one of the launched Future/Promise fails and dont want to wait for the rest to complete. What is the most idiomatic way to compose this pipeline in Scala. EDIT: more contextual information. I have to launch two external processes one writing to a fifo file and another reading from it. Say if the writer process fails; the reader thread might hang forever waiting for any input from the file. So I would want to launch both

When should I use Scala's Array instead of one of the other collections?

做~自己de王妃 提交于 2021-02-07 12:28:44
问题 This is more a question of style and preference but here goes: when should I use scala.Array? I use List all the time and occasionally run into Seq, Map and the like, but I've never used nor seen Array in the wild. Is it just there for Java compatibility? Am I missing a common use-case? 回答1: First of all, let's make a disclaimer here. Scala 2.7's Array tries to be a Java Array and a Scala Collection at the same time. It mostly succeeds, but fail at both for some corner cases. Unfortunately,

Spark + Hive : Number of partitions scanned exceeds limit (=4000)

有些话、适合烂在心里 提交于 2021-02-07 11:03:50
问题 We upgraded our Hadoop Platform (Spark; 2.3.0, Hive: 3.1), and I'm facing this exception when reading some Hive tables in Spark : "Number of partitions scanned on table 'my_table' exceeds limit (=4000)". Tables we are working on : table1 : external table with a total of ~12300 partitions, partitioned by(col1: String, date1: String) , (ORC compressed ZLIB) table2 : external table with a total of 4585 partitions, partitioned by(col21: String, date2: Date, col22: String) (ORC uncompressed) [A]

How to convert an Iterable to an RDD

戏子无情 提交于 2021-02-07 10:45:26
问题 To be more specific, how can i convert a scala.Iterable to a org.apache.spark.rdd.RDD ? I have an RDD of (String, Iterable[(String, Integer)]) and i want this to be converted into an RDD of (String, RDD[String, Integer]) , so that i can apply a reduceByKey function to the internal RDD . e.g i have an RDD where key is 2-lettered prefix of a person's name and the value is List of pairs of Person name and hours that they spent in an event my RDD is : ("To", List(("Tom",50),("Tod","30"),("Tom",70

Escape quotes is not working in spark 2.2.0 while reading csv

断了今生、忘了曾经 提交于 2021-02-07 10:34:18
问题 I am trying to read my delimited file which is tab separated but not able to read all records. Here is my input records: head1 head2 head3 a b c a2 a3 a4 a1 "b1 "c1 My code: var inputDf = sparkSession.read .option("delimiter","\t") .option("header", "true") // .option("inferSchema", "true") .option("nullValue", "") .option("escape","\"") .option("multiLine", true) .option("nullValue", null) .option("nullValue", "NULL") .schema(finalSchema) .csv("file:///C:/Users/prhasija/Desktop

Drawing a SVG with d3.js while tab is in background

流过昼夜 提交于 2021-02-07 10:27:00
问题 Context : I am working on a webapp that has to display some fairly complicated and constantly updating (multiple times per second) SVG images. The updates stem from a seperate server and the SVG is updates as soon as an update is received by the web frontend. The webapp is written in Scala.js and the image is created using the d3js library (see also: scala-js-d3). We currently only support Google Chrome. Problem : Once the webapp has been in a background tab for a while, the whole site gets

Drawing a SVG with d3.js while tab is in background

社会主义新天地 提交于 2021-02-07 10:26:22
问题 Context : I am working on a webapp that has to display some fairly complicated and constantly updating (multiple times per second) SVG images. The updates stem from a seperate server and the SVG is updates as soon as an update is received by the web frontend. The webapp is written in Scala.js and the image is created using the d3js library (see also: scala-js-d3). We currently only support Google Chrome. Problem : Once the webapp has been in a background tab for a while, the whole site gets

Spark: Accumulators does not work properly when I use it in Range

Deadly 提交于 2021-02-07 10:10:31
问题 I don't understand why my accumulator hasn't been updated properly by Spark. object AccumulatorsExample extends App { val acc = sc.accumulator(0L, "acc") sc range(0, 20000, step = 25) map { _ => acc += 1 } count() assert(acc.value == 800) // not equals } My Spark config: setMaster("local[*]") // should use 8 cpu cores I'm not sure if Spark distribute computations of accumulator on every core and maybe that's the problem. My question is how can I aggregate all acc values in one single sum and

Spark: Accumulators does not work properly when I use it in Range

不想你离开。 提交于 2021-02-07 10:10:29
问题 I don't understand why my accumulator hasn't been updated properly by Spark. object AccumulatorsExample extends App { val acc = sc.accumulator(0L, "acc") sc range(0, 20000, step = 25) map { _ => acc += 1 } count() assert(acc.value == 800) // not equals } My Spark config: setMaster("local[*]") // should use 8 cpu cores I'm not sure if Spark distribute computations of accumulator on every core and maybe that's the problem. My question is how can I aggregate all acc values in one single sum and

Type-safety with ADT and Aux pattern

北城余情 提交于 2021-02-07 09:51:53
问题 I'm designing type-safe code with ADT and Aux-pattern and cannot get rid of some asInstanceOf . Here is the example: sealed trait Source case object FileSystem extends Source case object Network extends Source sealed trait Data { type S <: Source } object Data { type Aux[T <: Source] = Data { type S = T } } case class RegularFile(path: String) extends Data { type S = FileSystem.type } case class Directory(path: String) extends Data { type S = FileSystem.type } case class UnixDevice(path: