scala | 易学教程

Scala Future/Promise fast-fail pipeline

阅读更多关于 Scala Future/Promise fast-fail pipeline

问题 I want to launch two or more Future/Promises in parallel and fail even if one of the launched Future/Promise fails and dont want to wait for the rest to complete. What is the most idiomatic way to compose this pipeline in Scala. EDIT: more contextual information. I have to launch two external processes one writing to a fifo file and another reading from it. Say if the writer process fails; the reader thread might hang forever waiting for any input from the file. So I would want to launch both

When should I use Scala's Array instead of one of the other collections?

阅读更多关于 When should I use Scala's Array instead of one of the other collections?

问题 This is more a question of style and preference but here goes: when should I use scala.Array? I use List all the time and occasionally run into Seq, Map and the like, but I've never used nor seen Array in the wild. Is it just there for Java compatibility? Am I missing a common use-case? 回答1: First of all, let's make a disclaimer here. Scala 2.7's Array tries to be a Java Array and a Scala Collection at the same time. It mostly succeeds, but fail at both for some corner cases. Unfortunately,

Spark + Hive : Number of partitions scanned exceeds limit (=4000)

阅读更多关于 Spark + Hive : Number of partitions scanned exceeds limit (=4000)

问题 We upgraded our Hadoop Platform (Spark; 2.3.0, Hive: 3.1), and I'm facing this exception when reading some Hive tables in Spark : "Number of partitions scanned on table 'my_table' exceeds limit (=4000)". Tables we are working on : table1 : external table with a total of ~12300 partitions, partitioned by(col1: String, date1: String) , (ORC compressed ZLIB) table2 : external table with a total of 4585 partitions, partitioned by(col21: String, date2: Date, col22: String) (ORC uncompressed) [A]

How to convert an Iterable to an RDD

阅读更多关于 How to convert an Iterable to an RDD

问题 To be more specific, how can i convert a scala.Iterable to a org.apache.spark.rdd.RDD ? I have an RDD of (String, Iterable[(String, Integer)]) and i want this to be converted into an RDD of (String, RDD[String, Integer]) , so that i can apply a reduceByKey function to the internal RDD . e.g i have an RDD where key is 2-lettered prefix of a person's name and the value is List of pairs of Person name and hours that they spent in an event my RDD is : ("To", List(("Tom",50),("Tod","30"),("Tom",70

Escape quotes is not working in spark 2.2.0 while reading csv

阅读更多关于 Escape quotes is not working in spark 2.2.0 while reading csv

问题 I am trying to read my delimited file which is tab separated but not able to read all records. Here is my input records: head1 head2 head3 a b c a2 a3 a4 a1 "b1 "c1 My code: var inputDf = sparkSession.read .option("delimiter","\t") .option("header", "true") // .option("inferSchema", "true") .option("nullValue", "") .option("escape","\"") .option("multiLine", true) .option("nullValue", null) .option("nullValue", "NULL") .schema(finalSchema) .csv("file:///C:/Users/prhasija/Desktop

Drawing a SVG with d3.js while tab is in background

阅读更多关于 Drawing a SVG with d3.js while tab is in background

问题 Context : I am working on a webapp that has to display some fairly complicated and constantly updating (multiple times per second) SVG images. The updates stem from a seperate server and the SVG is updates as soon as an update is received by the web frontend. The webapp is written in Scala.js and the image is created using the d3js library (see also: scala-js-d3). We currently only support Google Chrome. Problem : Once the webapp has been in a background tab for a while, the whole site gets

Drawing a SVG with d3.js while tab is in background

阅读更多关于 Drawing a SVG with d3.js while tab is in background

Spark: Accumulators does not work properly when I use it in Range

阅读更多关于 Spark: Accumulators does not work properly when I use it in Range

问题 I don't understand why my accumulator hasn't been updated properly by Spark. object AccumulatorsExample extends App { val acc = sc.accumulator(0L, "acc") sc range(0, 20000, step = 25) map { _ => acc += 1 } count() assert(acc.value == 800) // not equals } My Spark config: setMaster("local[*]") // should use 8 cpu cores I'm not sure if Spark distribute computations of accumulator on every core and maybe that's the problem. My question is how can I aggregate all acc values in one single sum and

Spark: Accumulators does not work properly when I use it in Range

阅读更多关于 Spark: Accumulators does not work properly when I use it in Range

Type-safety with ADT and Aux pattern

阅读更多关于 Type-safety with ADT and Aux pattern

问题 I'm designing type-safe code with ADT and Aux-pattern and cannot get rid of some asInstanceOf . Here is the example: sealed trait Source case object FileSystem extends Source case object Network extends Source sealed trait Data { type S <: Source } object Data { type Aux[T <: Source] = Data { type S = T } } case class RegularFile(path: String) extends Data { type S = FileSystem.type } case class Directory(path: String) extends Data { type S = FileSystem.type } case class UnixDevice(path: