scala

Scala Kleisli throws an error in IntelliJ

我的未来我决定 提交于 2021-02-13 12:21:02
问题 trying to implement Kleisli category for a made-up Partial type in Scala (reading Bartosz Milewski's "category theory for programmers", that's exersize for chapter 4) object Kleisli { type Partial[A, B] = A => Option[B] implicit class KleisliOps[A, B](f1: Partial[A, B]) { def >=>[C](f2: Partial[B, C]): Partial[A, C] = (x: A) => for { y <- f1(x) z <- f2(y) } yield z def identity(f: Partial[A, B]): Partial[A, B] = x => f(x) } val safeRecip: Partial[Double, Double] = { case 0d => None case x =>

When calling a scala function with compile-time macro, how to failover smoothly when it causes compilation errors?

一世执手 提交于 2021-02-13 05:43:30
问题 Assuming that I intend to use the singleton/literal type feature in a scala program, this feature is provided in shapeless library in scala 2.12 (scala 2.13 supports native literal type but let's use shapeless as an example) In shapeless, literal type is represented as a path-dependent inner type of Witness object, which can be implicitly converted from a scala literal/const: import com.tribbloids.spike.BaseSpec import shapeless.Witness import scala.util.Random val w: Witness.Lt[Int] = 3 val

大数据开发-从cogroup的实现来看join是宽依赖还是窄依赖

倾然丶 夕夏残阳落幕 提交于 2021-02-12 22:27:19
前面一篇文章提到大数据开发-Spark Join原理详解,本文从源码角度来看cogroup 的join实现 1.分析下面的代码 import org.apache.spark.rdd.RDD import org.apache.spark.{SparkConf, SparkContext} object JoinDemo { def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName(this.getClass.getCanonicalName.init).setMaster("local[*]") val sc = new SparkContext(conf) sc.setLogLevel("WARN") val random = scala.util.Random val col1 = Range(1, 50).map(idx => (random.nextInt(10), s"user$idx")) val col2 = Array((0, "BJ"), (1, "SH"), (2, "GZ"), (3, "SZ"), (4, "TJ"), (5, "CQ"), (6, "HZ"), (7, "NJ"), (8, "WH"), (0, "CD")) val rdd1: RDD[

FTP timing out on Heroku

爱⌒轻易说出口 提交于 2021-02-11 18:11:23
问题 Using Apache Commons FTPClient in a Scala application works as expected on my local machine, but always times out when running on Heroku. Relevant code: val ftp = new FTPClient ftp.connect(hostname) val success = ftp.login(username, pw) if (success) { ftp.changeWorkingDirectory(path) //a logging statement here WILL print val engine = ftp.initiateListParsing(ftp.printWorkingDirectory) //a logging statement here will NOT print while (engine.hasNext) { val files = engine.getNext(5) //do stuff

FTP timing out on Heroku

≡放荡痞女 提交于 2021-02-11 18:09:30
问题 Using Apache Commons FTPClient in a Scala application works as expected on my local machine, but always times out when running on Heroku. Relevant code: val ftp = new FTPClient ftp.connect(hostname) val success = ftp.login(username, pw) if (success) { ftp.changeWorkingDirectory(path) //a logging statement here WILL print val engine = ftp.initiateListParsing(ftp.printWorkingDirectory) //a logging statement here will NOT print while (engine.hasNext) { val files = engine.getNext(5) //do stuff

libraryDependencies Spark in build.sbt error (IntelliJ)

十年热恋 提交于 2021-02-11 16:53:30
问题 I am trying to learning Scala with Spark. I am following a tutorial but I am having an error, when I try to import the library dependencies of Spark : libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.3" I am getting the following error : And I have 3 Unkwons artifacts. What could be the problem here? My code is so simple, it is just a Hello World. 回答1: Probably you need to add to your build.sbt : resolvers += "spark-core" at "https://mvnrepository.com/artifact/org.apache.spark

libraryDependencies Spark in build.sbt error (IntelliJ)

☆樱花仙子☆ 提交于 2021-02-11 16:50:29
问题 I am trying to learning Scala with Spark. I am following a tutorial but I am having an error, when I try to import the library dependencies of Spark : libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.3" I am getting the following error : And I have 3 Unkwons artifacts. What could be the problem here? My code is so simple, it is just a Hello World. 回答1: Probably you need to add to your build.sbt : resolvers += "spark-core" at "https://mvnrepository.com/artifact/org.apache.spark

Error when connecting spark structured streaming + kafka

别说谁变了你拦得住时间么 提交于 2021-02-11 15:45:49
问题 im trying to connect my structured streaming spark 2.4.5 with kafka, but all the times that im trying this Data Source Provider errors appears. Follow my scala code and my sbt build: import org.apache.spark.sql._ import org.apache.spark.sql.types._ import org.apache.spark.sql.functions._ import org.apache.spark.sql.streaming.Trigger object streaming_app_demo { def main(args: Array[String]): Unit = { println("Spark Structured Streaming with Kafka Demo Application Started ...") val KAFKA_TOPIC

Spark 3 Typed User Defined Aggregate Function over Window

*爱你&永不变心* 提交于 2021-02-11 15:12:56
问题 I am trying to use a custom user defined aggregator over a window. When I use an untyped aggregator, the query works. However, I am unable to use typed UDAF as a window function - I get an error stating The query operator ``Project`` contains one or more unsupported expression types Aggregate, Window or Generate . The following basic program showcases the problem. I think it could work using UserDefinedAggregateFunction rather then Aggregator , but the former is deprecated. import scala

Pre-partition data in spark such that each partition has non-overlapping values in the column we are partitioning on

99封情书 提交于 2021-02-11 15:01:23
问题 I'm trying to pre-partition the data before doing an aggregation operation across a certain column of my data. I have 3 worker nodes and I would llike each partition to have non-overlapping values in the column I am partitioning on. I don't want to have situations where two partitions might have the same values in the column. e.g. If I have the following data ss_item_sk | ss_quantity 1 | 10.0 1 | 4.0 2 | 3.0 3 | 5.0 4 | 8.0 5 | 13.0 5 | 10.0 Then the following partitions are satisfactory: