scala | 易学教程

Scala Kleisli throws an error in IntelliJ

阅读更多关于 Scala Kleisli throws an error in IntelliJ

问题 trying to implement Kleisli category for a made-up Partial type in Scala (reading Bartosz Milewski's "category theory for programmers", that's exersize for chapter 4) object Kleisli { type Partial[A, B] = A => Option[B] implicit class KleisliOps[A, B](f1: Partial[A, B]) { def >=>[C](f2: Partial[B, C]): Partial[A, C] = (x: A) => for { y <- f1(x) z <- f2(y) } yield z def identity(f: Partial[A, B]): Partial[A, B] = x => f(x) } val safeRecip: Partial[Double, Double] = { case 0d => None case x =>

When calling a scala function with compile-time macro, how to failover smoothly when it causes compilation errors?

阅读更多关于 When calling a scala function with compile-time macro, how to failover smoothly when it causes compilation errors?

问题 Assuming that I intend to use the singleton/literal type feature in a scala program, this feature is provided in shapeless library in scala 2.12 (scala 2.13 supports native literal type but let's use shapeless as an example) In shapeless, literal type is represented as a path-dependent inner type of Witness object, which can be implicitly converted from a scala literal/const: import com.tribbloids.spike.BaseSpec import shapeless.Witness import scala.util.Random val w: Witness.Lt[Int] = 3 val

大数据开发-从cogroup的实现来看join是宽依赖还是窄依赖

阅读更多关于大数据开发-从cogroup的实现来看join是宽依赖还是窄依赖

前面一篇文章提到大数据开发-Spark Join原理详解,本文从源码角度来看cogroup 的join实现 1.分析下面的代码 import org.apache.spark.rdd.RDD import org.apache.spark.{SparkConf, SparkContext} object JoinDemo { def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName(this.getClass.getCanonicalName.init).setMaster("local[*]") val sc = new SparkContext(conf) sc.setLogLevel("WARN") val random = scala.util.Random val col1 = Range(1, 50).map(idx => (random.nextInt(10), s"user$idx")) val col2 = Array((0, "BJ"), (1, "SH"), (2, "GZ"), (3, "SZ"), (4, "TJ"), (5, "CQ"), (6, "HZ"), (7, "NJ"), (8, "WH"), (0, "CD")) val rdd1: RDD[

FTP timing out on Heroku

阅读更多关于 FTP timing out on Heroku

问题 Using Apache Commons FTPClient in a Scala application works as expected on my local machine, but always times out when running on Heroku. Relevant code: val ftp = new FTPClient ftp.connect(hostname) val success = ftp.login(username, pw) if (success) { ftp.changeWorkingDirectory(path) //a logging statement here WILL print val engine = ftp.initiateListParsing(ftp.printWorkingDirectory) //a logging statement here will NOT print while (engine.hasNext) { val files = engine.getNext(5) //do stuff

FTP timing out on Heroku

阅读更多关于 FTP timing out on Heroku

libraryDependencies Spark in build.sbt error (IntelliJ)

阅读更多关于 libraryDependencies Spark in build.sbt error (IntelliJ)

问题 I am trying to learning Scala with Spark. I am following a tutorial but I am having an error, when I try to import the library dependencies of Spark : libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.3" I am getting the following error : And I have 3 Unkwons artifacts. What could be the problem here? My code is so simple, it is just a Hello World. 回答1: Probably you need to add to your build.sbt : resolvers += "spark-core" at "https://mvnrepository.com/artifact/org.apache.spark

libraryDependencies Spark in build.sbt error (IntelliJ)

阅读更多关于 libraryDependencies Spark in build.sbt error (IntelliJ)

Error when connecting spark structured streaming + kafka

阅读更多关于 Error when connecting spark structured streaming + kafka

问题 im trying to connect my structured streaming spark 2.4.5 with kafka, but all the times that im trying this Data Source Provider errors appears. Follow my scala code and my sbt build: import org.apache.spark.sql._ import org.apache.spark.sql.types._ import org.apache.spark.sql.functions._ import org.apache.spark.sql.streaming.Trigger object streaming_app_demo { def main(args: Array[String]): Unit = { println("Spark Structured Streaming with Kafka Demo Application Started ...") val KAFKA_TOPIC

Spark 3 Typed User Defined Aggregate Function over Window

阅读更多关于 Spark 3 Typed User Defined Aggregate Function over Window

问题 I am trying to use a custom user defined aggregator over a window. When I use an untyped aggregator, the query works. However, I am unable to use typed UDAF as a window function - I get an error stating The query operator ``Project`` contains one or more unsupported expression types Aggregate, Window or Generate . The following basic program showcases the problem. I think it could work using UserDefinedAggregateFunction rather then Aggregator , but the former is deprecated. import scala

Pre-partition data in spark such that each partition has non-overlapping values in the column we are partitioning on

阅读更多关于 Pre-partition data in spark such that each partition has non-overlapping values in the column we are partitioning on

问题 I'm trying to pre-partition the data before doing an aggregation operation across a certain column of my data. I have 3 worker nodes and I would llike each partition to have non-overlapping values in the column I am partitioning on. I don't want to have situations where two partitions might have the same values in the column. e.g. If I have the following data ss_item_sk | ss_quantity 1 | 10.0 1 | 4.0 2 | 3.0 3 | 5.0 4 | 8.0 5 | 13.0 5 | 10.0 Then the following partitions are satisfactory: