executor | 易学教程

How jobs are assigned to executors in Spark Streaming?

阅读更多关于 How jobs are assigned to executors in Spark Streaming?

Let's say I've got 2 or more executors in a Spark Streaming application. I've set the batch time of 10 seconds, so a job is started every 10 seconds reading input from my HDFS. If the every job lasts for more than 10 seconds, the new job that is started is assigned to a free executor right? Even if the previous one didn't finish? I know it seems like a obvious answer but I haven't found anything about job scheduling in the website or on the paper related to Spark Streaming. If you know some links where all of those things are explained, I would really appreciate to see them. Thank you.

Netty Socket编程

阅读更多关于 Netty Socket编程

我的以 Netty Socket编程的代码为例， 1、EventLoopGroup 进入EventLoopGroup，这是一个特殊的EventExecutorGroup，在事件循环中，在selection选择的时候，可以注册Channel。（Channel可以理解为跟客户端的连接） /** * Special {@link EventExecutorGroup} which allows registering {@link Channel}s that get * processed for later selection during the event loop. * */ public interface EventLoopGroup extends EventExecutorGroup { /** * Return the next {@link EventLoop} to use */ @Override EventLoop next(); /** * Register a {@link Channel} with this {@link EventLoop}. The returned {@link ChannelFuture} * will get notified once the registration was complete. */

Spring 中使用 Java 5.0 Executor

阅读更多关于 Spring 中使用 Java 5.0 Executor

Java 5.0 新增了一个并发工具包 java.util.concurrent，该工具包由 DougLea 设计并作为 JSR-166 添加到 Java 5.0 中。这是一个非常流行的并发工具包。它提供了功能强大的、高层次的线程构造器，包含执行器、线程任务框架、线程安全队列、计时器、锁（包含原子级别的锁）和其他一些同步的基本类型。执行器 Executor 是并发工具包中一个重要的类，它对 Runnable 实例的执行进行了抽象，实现者可以提供具体的实现，如简单地以一个线程来运行 Runnable，或者通过一个线程池为 Runnable 提供共享线程。因为 Executor 是 Java 5.0 新增的类，所以 Java 5.0 提供的实现类大多拥有线程池的内在支持。Spring 为 Executor 处理引入了一个新的抽象层，以便将线程池引入 Java 1.3 和 Java 1.4 环境中，同时屏蔽掉 Java 1.3、1.4、5.0 及 JavaEE 环境中线程池实现的差异。 1.了解 Java 5.0 的 Executor java.util.concurrent.Executor 接口的主要目的是将“任务提交”和“任务执行”分离解耦。该接口定义了任务提交的方法，实现者可以提供不同的任务执行机制，指定不同的线程使用规则和调度方案。 Executor 只有一个方法：void

Method call to Future.get() blocks. Is that really desirable?

阅读更多关于 Method call to Future.get() blocks. Is that really desirable?

Please read the question carefully before marking this as duplicate. Below is the snippet of the pseudo code. My question is- Does the below code not defeat the very notion of parallel asynchronous processing? The reason I ask this is because in the below code the main thread would submit a task to be executed in a different thread. After submitting the task in the queue, it blocks on Future.get() method for the task to return the value. I would rather have the task executed in the main thread rather than submitting to a different thread and waiting for the results. What is that I gained by

Executor线程池原理与源码解读

阅读更多关于 Executor线程池原理与源码解读

线程池为线程生命周期的开销和资源不足问题提供了解决方案。通过对多个任务重用线程，线程创建的开销被分摊到了多个任务上。线程实现方式 Thread、Runnable、Callable //实现Runnable接口的类将被Thread执行，表示一个基本任务 public interface Runnable { //run方法就是它所有内容，就是实际执行的任务 public abstract void run(); } //Callable同样是任务，与Runnable接口的区别在于它接口泛型，同时它执行任务候带有返回值； //Callable的使用通过外层封装成Future来使用 public interface Callable<V> { //相对于run方法，call方法带有返回值 V call() throws Exception; } 注意：启动Thread线程只能用start（JNI方法）来启动，start方法通知虚拟机，虚拟机通过调用器映射到底层操作系统，通过操作系统来创建线程来执行当前任务的run方法 Executor框架 Executor接口是线程池框架中最基础的部分，定义了一个用于执行Runnable的execute方法。从图中可以看出Exectuor下有一个重要的子接口ExecutorService，其中定义了线程池的具体行为： execute

Kylin 2.0 Spark Cubing 优化改进

阅读更多关于 Kylin 2.0 Spark Cubing 优化改进

Kylin 2.0 引入了Spark Cubing beta版本，本文主要介绍我是如何让 Spark Cubing 支持启用Kerberos的HBase集群，再介绍下Spark Cubing的性能测试结果和适用场景。 Spark Cubing 简介在简介Spark Cubing之前，我简介下MapReduce Batch Cubing。所谓的MapReduce Batch Cubing就是利用MapReduce 计算引擎批量计算Cube，其输入是Hive表，输出是HBase的KeyValue，整个构建过程主要包含以下6步：建立Hive的大宽表；（MapReduce计算）对需要字典编码的列计算列基数；（MapReduce计算）构建字典；（JobServer计算 or MapReduce计算）分层构建Cuboid；（MapReduce计算）将Cuboid转为HBase的KeyValue结构（HFile）；（MapReduce计算）元数据更新和垃圾回收。详细的Cube生成过程可以参考 Apache Kylin Cube 构建原理。而Kylin 2.0的Spark Cubing就是在Cube构建的第4步替换掉MapReduce。如下图，就是将5个MR job转换为1个Spark job：（注：以下两个图片引自 Apache Kylin 官网的blog

Spark 在yarn上运行模式详解：cluster模式和client模式

阅读更多关于 Spark 在yarn上运行模式详解：cluster模式和client模式

1. 官方文档 http://spark.apache.org/docs/latest/running-on-yarn.html 2. 配置安装 2.1.安装hadoop：需要安装HDFS模块和YARN模块，HDFS必须安装，spark运行时要把jar包存放到HDFS上。 2.2.安装Spark：解压Spark安装程序到一台服务器上，修改spark-env.sh配置文件，spark程序将作为YARN的客户端用于提交任务 export JAVA_HOME=/usr/local/jdk1.8 export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop 2.3. 启动HDFS 和YARN 3. 运行模式（cluster模式和client模式） 3.1. cluster 模式官网案例计算PI spark-submit --class org.apache.spark.examples.SparkPi \ --master yarn \ --deploy-mode cluster \ --driver-memory 1g \ --executor-memory 1g \ --executor-cores 1 \ /export/servers/spark/examples/jars/spark-examples_2.11-2.0.2.jar

How to properly catch RuntimeExceptions from Executors?

阅读更多关于 How to properly catch RuntimeExceptions from Executors?

问题 Say that I have the following code: ExecutorService executor = Executors.newSingleThreadExecutor(); executor.execute(myRunnable); Now, if myRunnable throws a RuntimeExcpetion , how can I catch it? One way would be to supply my own ThreadFactory implementation to newSingleThreadExecutor() and set custom uncaughtExceptionHandler s for the Thread s that come out of it. Another way would be to wrap myRunnable to a local (anonymous) Runnable that contains a try-catch -block. Maybe there are other

spark-submit提交python脚本过程记录

阅读更多关于 spark-submit提交python脚本过程记录

最近刚学习spark，用spark-submit命令提交一个python脚本，一开始老报错，所以打算好好整理一下用spark-submit命令提交python脚本的过程。先看一下spark-submit的可选参数 1.spark-submit参数 --master MASTER_URL:设置集群的主URL，用于决定任务提交到何处执行。常见的选项有 local:提交到本地服务器执行，并分配单个线程 local[k]:提交到本地服务器执行，并分配k个线程 spark://HOST:PORT:提交到standalone模式部署的spark集群中，并指定主节点的IP与端口 mesos://HOST:PORT：提交到mesos模式部署的集群中，并指定主节点的IP与端口 yarn:提交到yarn模式部署的集群中 --deploy-mode DEPLOY_MODE:设置driver启动的未知，可选项如下，默认为client client:在客户端上启动driver，这样逻辑运算在client上执行，任务执行在cluster上 cluster：逻辑运算与任务执行均在cluster上，cluster模式暂时不支持于Mesos集群或Python应用程序 --class CLASS_NAME :指定应用程序的类入口，即主类，仅针对java、scala程序，不作用于python程序 --name NAME

[转]史上最最最详细的手写Promise教程

阅读更多关于 [转]史上最最最详细的手写Promise教程

我们工作中免不了运用promise用来解决异步回调问题。平时用的很多库或者插件都运用了promise 例如axios、fetch等等。但是你知道promise是咋写出来的呢？别怕～这里有本promisesA+规范，便宜点10元卖给你了。 ERvaA3z.png 1、Promise 的声明首先呢，promise肯定是一个类，我们就用class来声明。 • 由于 new Promise((resolve, reject)=>{}) ，所以传入一个参数（函数），秘籍里叫他executor，传入就执行。 •executor里面有两个参数，一个叫resolve（成功），一个叫reject（失败）。 •由于resolve和reject可执行，所以都是函数，我们用let声明。 class Promise{ // 构造器 constructor(executor){ // 成功 let resolve = () => { }; // 失败 let reject = () => { }; // 立即执行 executor(resolve, reject); } } 解决基本状态秘籍对Promise有规定： •Promise存在三个状态（state）pending、fulfilled、rejected •pending（等待态）为初始态，并可以转化为fulfilled（成功态）和rejected

订阅 executor