scala

Spark学习之路 (三)Spark之RDD

|▌冷眼眸甩不掉的悲伤 提交于 2020-12-12 21:25:48
讨论QQ:1586558083 目录 一、RDD的概述 1.1 什么是RDD? 1.2 RDD的属性 1.3 WordCount粗图解RDD 二、RDD的创建方式 2.1 通过读取文件生成的 2.2 通过并行化的方式创建RDD 2.3 其他方式 三、RDD编程API 3.1 Transformation 3.2 Action 3.3 Spark WordCount代码编写 3.4 WordCount执行过程图 四、RDD的宽依赖和窄依赖 4.1 RDD依赖关系的本质内幕 4.2 依赖关系下的数据流视图 正文 回到顶部 一、RDD的概述 1.1 什么是RDD? RDD(Resilient Distributed Dataset)叫做 弹性分布式数据集 , 是Spark中最基本的数据抽象 ,它代表一个不可变、可分区、里面的元素可并行计算的集合。RDD具有数据流模型的特点:自动容错、位置感知性调度和可伸缩性。RDD允许用户在执行多个查询时显式地将工作集缓存在内存中,后续的查询能够重用工作集,这极大地提升了查询速度。 1.2 RDD的属性 (1)一组分片(Partition),即数据集的基本组成单位。对于RDD来说,每个分片都会被一个计算任务处理,并决定并行计算的粒度。用户可以在创建RDD时指定RDD的分片个数,如果没有指定,那么就会采用默认值。默认值就是程序所分配到的CPU

SBT: Evaluating sequence of tasks

我是研究僧i 提交于 2020-12-12 21:23:34
问题 I am trying to get the information about all modules in my sbt project. Having the following code lazy val getModule = taskKey[Module]("get single module info") lazy val allModules = taskKey[Seq[Module]]("get all modules info") getModule := Def.task { Module(name.value, description.value, version.value, organization.value) }.value, allModules := Def.task { val sbtModules = (ThisScope / thisProject).value.aggregate sbtModules.map { m => (ThisScope.in(m) / getModule).value } }.value I'm getting

【Spark笔记】Windows10 本地搭建单机版Spark开发环境

╄→гoц情女王★ 提交于 2020-12-12 21:23:13
0x00 环境及软件 1、系统环境 OS:Windows10_x64 专业版 2、所需软件或工具 JDK1.8.0_131 spark-2.3.0-bin-hadoop2.7.tgz hadoop-2.8.3.tar.gz scala-2.11.8.zip hadoop-common-2.2.0-bin-master.zip(主要使用里面的winutils.exe) IntelliJ IDEA(版本:2017.1.2 Build #IU-171.4249.32,built on April 21,2017) scala-intellij-bin-2017.1.20.zip(IntelliJ IDEA scala插件) apache-maven-3.5.0 0x01 搭建步骤 1、安装JDK 从 http://www.oracle.com/technetwork/java/javase/downloads/index.html 处下载相应版本的JDK安装文件,安装教程不再赘述,最终安装后的路径如下(由于之前就安装过JDK了,所以此处显示时间为2017年的): 在环境变量中配置JDK信息,新建变量JAVA_HOME=C:\SelfFiles\Install\Java\jdk1.8.0_131,并在Path中添加JDK信息%JAVA_HOME%\bin,如下: 然后,打开一个命令行界面

Spray-Json: serialize None as null

夙愿已清 提交于 2020-12-12 10:13:46
问题 I am porting a rest API to scala, using akka-http with spray-json. The old API had the following response: { "result": { ... }, "error": null } Now I want to maintain exact backwards compatibility, so when there's no error I want an error key with a null value. However I can't see any support for this in spray-json. When I serialize the following with a None error: case class Response(result: Result, error: Option[Error]) I end up with { "result": { ... } } And it completely drops the error

How Scala App trait and main works internally?

巧了我就是萌 提交于 2020-12-12 08:32:56
问题 Hi I'm newbie in Scala. As far as I know there are 2ways to make entry point in scala, one is define main method with object and the other is extending App trait. I wondered how App trait works, so I checked the source for App trait, but there are full of confusing code... The code said that the App has initCodes which are extended from App trait, and these are added in delayedInit method that inherited from DelayedInit . Also the App trait has main method, which will be entry point. But the

How Scala App trait and main works internally?

十年热恋 提交于 2020-12-12 08:27:26
问题 Hi I'm newbie in Scala. As far as I know there are 2ways to make entry point in scala, one is define main method with object and the other is extending App trait. I wondered how App trait works, so I checked the source for App trait, but there are full of confusing code... The code said that the App has initCodes which are extended from App trait, and these are added in delayedInit method that inherited from DelayedInit . Also the App trait has main method, which will be entry point. But the

How to fix “static methods in interface require -target:jvm-1.8” in Scala application?

限于喜欢 提交于 2020-12-12 05:44:34
问题 I wrote the following code: import software.amazon.awssdk.services.cloudwatchlogs.CloudWatchLogsClient class Test() extends CloudWatchLogsClient { CloudWatchLogsClient.builder().build() def close():Unit = { println("test") } def serviceName(): String = "serviceName" CloudWatchLogsClient.create() } When it comes to compiling , I get the following error: Static methods in interface require -target:jvm-1.8 CloudWatchLogsClient.builder().build() Finally, I used the following dependencies in my

How to fix “static methods in interface require -target:jvm-1.8” in Scala application?

戏子无情 提交于 2020-12-12 05:42:41
问题 I wrote the following code: import software.amazon.awssdk.services.cloudwatchlogs.CloudWatchLogsClient class Test() extends CloudWatchLogsClient { CloudWatchLogsClient.builder().build() def close():Unit = { println("test") } def serviceName(): String = "serviceName" CloudWatchLogsClient.create() } When it comes to compiling , I get the following error: Static methods in interface require -target:jvm-1.8 CloudWatchLogsClient.builder().build() Finally, I used the following dependencies in my

How to fix “static methods in interface require -target:jvm-1.8” in Scala application?

余生长醉 提交于 2020-12-12 05:42:30
问题 I wrote the following code: import software.amazon.awssdk.services.cloudwatchlogs.CloudWatchLogsClient class Test() extends CloudWatchLogsClient { CloudWatchLogsClient.builder().build() def close():Unit = { println("test") } def serviceName(): String = "serviceName" CloudWatchLogsClient.create() } When it comes to compiling , I get the following error: Static methods in interface require -target:jvm-1.8 CloudWatchLogsClient.builder().build() Finally, I used the following dependencies in my

How to fix “static methods in interface require -target:jvm-1.8” in Scala application?

江枫思渺然 提交于 2020-12-12 05:42:28
问题 I wrote the following code: import software.amazon.awssdk.services.cloudwatchlogs.CloudWatchLogsClient class Test() extends CloudWatchLogsClient { CloudWatchLogsClient.builder().build() def close():Unit = { println("test") } def serviceName(): String = "serviceName" CloudWatchLogsClient.create() } When it comes to compiling , I get the following error: Static methods in interface require -target:jvm-1.8 CloudWatchLogsClient.builder().build() Finally, I used the following dependencies in my