scala

Spark Scala S3 storage: permission denied

|▌冷眼眸甩不掉的悲伤 提交于 2021-01-29 08:12:33
问题 I've read a lot of topic on Internet on how to get working Spark with S3 still there's nothing working properly. I've downloaded : Spark 2.3.2 with hadoop 2.7 and above. I've copied only some libraries from Hadoop 2.7.7 (which matches Spark/Hadoop version) to Spark jars folder: hadoop-aws-2.7.7.jar hadoop-auth-2.7.7.jar aws-java-sdk-1.7.4.jar Still I can't use nor S3N nor S3A to get my file read by spark: For S3A I have this exception: sc.hadoopConfiguration.set("fs.s3a.access.key",

How to just compile only changed (or related) module in multi-module gradle project

半城伤御伤魂 提交于 2021-01-29 08:11:10
问题 I am having multi-module gradle project, which is having 5 modules. And these modules creates a lineage of dependency ie. A <- B <- C <- D <- E . Here A depends on B , B depends on C and so on. My problem is that, If i am going to change in A , it compile all the parent modules. Is there any way to just compile only A . And if i change in B then only compile A and B . 回答1: Yes, @JB Nizet you are correct i was using one of the scalaCompileOption property. tasks.withType(ScalaCompile) {

Elki GDBSCAN Java/Scala - how to modify the CorePredicate

大城市里の小女人 提交于 2021-01-29 08:08:06
问题 How is the generalised dbscan (gdbscan) in elki implemented in Java/Scala? I am currently trying to find an efficient way to implement a weighted dbscan on elki to offset the inefficiencies coming from the sklearn implementation of the weighted dbscan. The reason I am doing this at the moment is because the sklearn simply sucks for implementing the dbscan on clusters on datasets on the terabyte scale (on the cloud, which in this case I am). For example, I have made the following code with the

How to compare attribute values from different classes?

故事扮演 提交于 2021-01-29 07:43:08
问题 I've two classes with a few attributes that have the same names, for example: case class Rect(x: Int, y: Int) case class Squa(x: Int, y: Int) To compare it I do: val r = new Rect(2, 2) val s = new Squa(2, 2) r.x == s.x && r.y == x.y If I have "N" attributes I have to compare one by one, is there a way to compare all attributes at once since they have the same name? I've tried: r.asInstanceOf[Squa] eq s But this gives me the error: class Rect cannot be cast to class Squa (Rect and Squa are in

Object inside class

℡╲_俬逩灬. 提交于 2021-01-29 07:41:07
问题 Scala 2.12 What is wrong with my implementation? object MyJob extends DatasetReader(x=x) { val x = "aaa" DatasetReader.read() } class DatasetReader(x: String) { object DatasetReader { def read(): String = { // ... } } } error: super constructor cannot be passed a self reference unless parameter is declared by-name How to fix it? 回答1: Another option you have is: object MyJob extends { val x = "aaa" } with DatasetReader(x) { DatasetReader.read() } Code run at Scastie. There are similar post in

Object inside class

て烟熏妆下的殇ゞ 提交于 2021-01-29 07:31:57
问题 Scala 2.12 What is wrong with my implementation? object MyJob extends DatasetReader(x=x) { val x = "aaa" DatasetReader.read() } class DatasetReader(x: String) { object DatasetReader { def read(): String = { // ... } } } error: super constructor cannot be passed a self reference unless parameter is declared by-name How to fix it? 回答1: Another option you have is: object MyJob extends { val x = "aaa" } with DatasetReader(x) { DatasetReader.read() } Code run at Scastie. There are similar post in

Shapeless and annotations

99封情书 提交于 2021-01-29 07:23:18
问题 I would like to have some function applied to fields in a case class, that are annotated with MyAnnotation . The idea is to transform type T into its generic representation, extract annotations, zip, fold right (or left) to reconstruct a generic representation and finally get back to type T . I followed the answer provided here and this gist. I'm using scala 2.11.12 and shapeless 2.3.3. Hereafter is my code: import shapeless._ import shapeless.ops.hlist._ case class MyAnnotation(func: String)

How to reflect concrete return types for methods of classes defined at runtime using the Scala ToolBox?

别等时光非礼了梦想. 提交于 2021-01-29 07:07:43
问题 When reflecting the foo() method of the class Cls we can easily get the concrete return type using the following. class Cls { def foo() = List("A", "B") } val classType = ru.typeOf[Cls] val classMirror = toolbox.mirror.reflectClass(classType.typeSymbol.asClass) val ctorSymbol = classType.decl(ru.termNames.CONSTRUCTOR).asMethod val methodSymb = classType.decl(ru.TermName("foo")).asMethod val ctor = classMirror.reflectConstructor(ctorSymbol) val instance = ctor() val im = toolbox.mirror.reflect

Reading property file from external path in spark scala throwing error

核能气质少年 提交于 2021-01-29 07:06:30
问题 I am trying to read property file from external path in spark scala like this : spark-submit --class com.spark.scala.my.class --deploy-mode cluster --master yarn --files /user/mine/dev.properties /path/to/jar/dev-0.0.1-SNAPSHOT-uber.jar 2020-08-19T06:00:00Z 2020-08-20T07:00:00Z and I am reading like this: val props = new Properties() val filePath = SparkFiles.get("/user/mine/dev.properties") LOGGER.info("Path to file : "+ filePath) val is= Source.fromFile(filePath) props.load(is

Select few columns from nested array of struct from a Dataframe in Scala

99封情书 提交于 2021-01-29 06:50:28
问题 I have a dataframe with array of struct and inside that another array of struct. Any easy way to select few of the structs in the main array and also few in the nested array without disturbing the structure of the entire dataframe? SIMPLE INPUT: -MainArray ---StructCol1 ---StructCol2 ---StructCol3 ---SubArray ------SubArrayStruct4 ------SubArrayStruct5 ------SubArrayStruct6 SIMPLE OUTPUT: -MainArray ---StructCol1 ---StructCol2 ---SubArray ------SubArrayStruct4 ------SubArrayStruct5 The source