scala | 易学教程

Logger is not working inside spark UDF on cluster

阅读更多关于 Logger is not working inside spark UDF on cluster

问题 I have placed log.info statements inside my UDF but it is getting failed on cluster. Local working fine. Here is the snippet: def relType = udf((colValue: String, relTypeV: String) => { var relValue = "NA" val relType = relTypeV.split(",").toList val relTypeMap = relType.map { col => val split = col.split(":") (split(0), split(1)) }.toMap // val keySet = relTypeMap relTypeMap.foreach { x => if ((x._1 != null || colValue != null || x._1.trim() != "" || colValue.trim() != "") && colValue

HList foldLeft with tuple as zero

阅读更多关于 HList foldLeft with tuple as zero

问题 I'm trying to foldLeft on a HList with an accumulator of type (HL, Int) , where HL is a HList. The program below does not compile. However, if I switch to a simpler accumulator of type HL (by just switching the commented lines with the ones above), it compiles and it works. Wrapping an HList in a tuple breaks the implicit resolution for the leftFolder. What am I missing? package foo.bar import shapeless.{:+:, ::, CNil, Coproduct, Generic, HList, HNil, Lazy, Poly2} import shapeless.ops.hlist.

Mutual Authentication in Scala with Akka

阅读更多关于 Mutual Authentication in Scala with Akka

问题 I would create a TLS Session in Scala using Akka with mutual authentication between a client and a server. I have created two CA certificate that have to trust the respective certificates incoming from the other part. Could you give me an exemple of how implement this? Thank you. 回答1: I created a github project which demonstrates mutual authentication with different kind of clients, including Akka. Please have a look here: https://github.com/Hakky54/mutual-tls-ssl It contains a full example

Spark: subtract dataframes but preserve duplicate values

阅读更多关于 Spark: subtract dataframes but preserve duplicate values

问题 Suppose I have two Spark SQL dataframes A and B . I want to subtract the items in B from the items in A while preserving duplicates from A . I followed the instructions to use DataFrame.except() that I found in another StackOverflow question ("Spark: subtract two DataFrames"), but that function removes all duplicates from the original dataframe A . As a conceptual example, if I have two dataframes: words = [the, quick, fox, a, brown, fox] stopWords = [the, a] then I want the output to be, in

In Scala - How to get the day of the week?

阅读更多关于 In Scala - How to get the day of the week?

问题 Suppose my date format is 21/05/2017 then the output will be SUN. How can I get the day given a date? 回答1: import java.time.LocalDate import java.time.format.DateTimeFormatter val df = DateTimeFormatter.ofPattern("dd/MM/yyyy") val dayOfWeek = LocalDate.parse("21/05/2017",df).getDayOfWeek 回答2: You can use SimpleDateFormat as illustrated below: import java.util.Calendar import java.text.SimpleDateFormat val now = Calendar.getInstance.getTime val date = new SimpleDateFormat("yyyy-MM-dd") date

How can I configure spark so that it creates “_$folder$” entries in S3?

阅读更多关于 How can I configure spark so that it creates “_$folder$” entries in S3?

问题 When I write my dataframe to S3 using df.write .format("parquet") .mode("overwrite") .partitionBy("year", "month", "day", "hour", "gen", "client") .option("compression", "gzip") .save("s3://xxxx/yyyy") I get the following in S3 year=2018 year=2019 but I would like to have this instead: year=2018 year=2018_$folder$ year=2019 year=2019_$folder$ The scripts that are reading from that S3 location depend on the *_$folder$ entries, but I haven't found a way to configure spark/hadoop to generate

Scala cannot infer

阅读更多关于 Scala cannot infer

问题 I have a very simple snipper of Spark code which was working on Scala 2.11 and stop compiling after 2.12. import spark.implicits._ val ds = Seq("val").toDF("col1") ds.foreachPartition(part => { part.foreach(println) }) It fails with the error: Error:(22, 12) value foreach is not a member of Object part.foreach(println) The workaround is to help the compiler with such code: import spark.implicits._ val ds = Seq("val").toDF("col1") println(ds.getClass) ds.foreachPartition((part: Iterator[Row])

Thread Safety in Scala reflection with type matching

阅读更多关于 Thread Safety in Scala reflection with type matching

问题 Working in scala 2.11.12, JDK 1.8.0_131, I have been able to replicate a thread safety bug observed in Apache Spark with the following code, in which I repeatedly check with multiple threads whether Option[Int] can be matched via <:< to Option[_] : package stuff import java.util.concurrent.{Executors, Future} import scala.collection.mutable.ListBuffer object Main { val universe: scala.reflect.runtime.universe.type = scala.reflect.runtime.universe import universe._ def mirror: universe.Mirror

Thread Safety in Scala reflection with type matching

阅读更多关于 Thread Safety in Scala reflection with type matching

Inherit from a class parametrized by an inner type

阅读更多关于 Inherit from a class parametrized by an inner type

问题 I would like to have a class B that inherits from a generic class A that is parametrized by an inner type of B . Specifically, I would like this (minimized example): class A[T] class B extends A[T] { class T } Written like this, the compiler does not accept it. Is there any way to specify this inheritance relationship? (Using some different syntax, or some tricks.) If not, what would be an official reference documenting that this is not possible? Notes: Yes, I want T to be an inner class. I