executor | 易学教程

Spark on YARN resource manager: Relation between YARN Containers and Spark Executors

阅读更多关于 Spark on YARN resource manager: Relation between YARN Containers and Spark Executors

问题 I'm new to Spark on YARN and don't understand the relation between the YARN Containers and the Spark Executors . I tried out the following configuration, based on the results of the yarn-utils.py script, that can be used to find optimal cluster configuration. The Hadoop cluster (HDP 2.4) I'm working on: 1 Master Node: CPU: 2 CPUs with 6 cores each = 12 cores RAM: 64 GB SSD: 2 x 512 GB 5 Slave Nodes: CPU: 2 CPUs with 6 cores each = 12 cores RAM: 64 GB HDD: 4 x 3 TB = 12 TB HBase is installed

Spark总结整理(三)：Spark Core 性能优化之资源调优

阅读更多关于 Spark总结整理(三)：Spark Core 性能优化之资源调优

Spark性能优化主要分为：开发调优资源调优数据倾斜调优 shuffle调优在开发完Spark作业之后，就该为作业配置合适的资源了资源参数设置的不合理，可能会导致没有充分利用集群资源，作业运行会极其缓慢；或者设置的资源过大，队列没有足够的资源来提供，进而导致各种异常本篇罗列资源调优的注意事项 1. 引言建议先了解 Spark作业基本运行原理和 Spark内存模型参考： https://blog.csdn.net/super_wj0820/article/details/100533335 https://www.cnblogs.com/qingyunzong/p/8955141.html 尤其注意区分 Spark1.6.0 之后内存默认为统一管理（Unified Memory Manager）方式，不再是静态管理（Static Memory Manager）方式，在针对内存资源做参数配置是要注意区分当前的内存管理策略 2. 资源参数调优 2.1 num-executors 参数说明：该参数用于设置 Spark 作业总共要用多少个 Executor 进程来执行 Driver在向YARN集群管理器申请资源时，YARN集群管理器会尽可能按照你的设置来在集群的各个工作节点上，启动相应数量的Executor进程这个参数非常重要，如果不设置的话

Mybatis插件原理（二）-- Mybatis插件执行流程

阅读更多关于 Mybatis插件原理（二）-- Mybatis插件执行流程

下面以Executor为例，让我们深入理解一下整个流程。代理链的生成 Mybatis支持对Executor、StatementHandler、ParameterHandler和ResultSetHandler进行拦截，也就是说会对这4种对象进行代理。通过查看Configuration类的源代码我们可以看到，每次都对目标对象进行代理链的生成。 Mybatis在创建Executor对象时会执行下面一行代码： executor =(Executor) interceptorChain.pluginAll(executor); InterceptorChain里保存了所有的拦截器，它在mybatis初始化的时候创建。上面这句代码的含义是调用拦截器链里的每个拦截器依次对executor进行plugin（插入？）代码如下： /** * 每一个拦截器对目标类都进行一次代理 * @param target * @return 层层代理后的对象 */ public Object pluginAll(Object target) { for(Interceptor interceptor : interceptors) { target = interceptor.plugin(target); } return target; } 下面以一个简单的例子来看看这个plugin方法里到底发生了什么：

springboot中@Aysnc的线程池设置

阅读更多关于 springboot中@Aysnc的线程池设置

1、添加AsyncConfig配置类，设置线程池信息 @Configuration @EnableAsync public class AsyncConfig implements AsyncConfigurer { public static final Logger logger = LoggerFactory.getLogger(AsyncConfig.class); /** * 线程数量未达到corePoolSize，则新建一个线程(核心线程)执行任务 * 线程数量达到了corePools，则将任务移入队列等待 * 队列已满，新建线程(非核心线程)执行任务 * 队列已满，总线程数又达到了maximumPoolSize，就会抛出异常 * * @param * @Return: java.util.concurrent.Executor * @Author: niuqingsong * @Date: 2019/9/3 */ @Override public Executor getAsyncExecutor() { ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor(); /*核心线程数，如果当前运行的线程数小于corePoolSize，那么就创建线程来执行任务*/ executor

How to ensure garbage collection of a FutureTask that is submitted to a ThreadPoolExecutor and then cancelled?

阅读更多关于 How to ensure garbage collection of a FutureTask that is submitted to a ThreadPoolExecutor and then cancelled?

问题 I am submitting Callable objects to a ThreadPoolExecutor and they seem to be sticking around in memory. Looking at the heap dump with the MAT tool for Eclipse see that the Callable objects are being referenced by a FutureTask$Sync 's callable variable. That FutureTask$Sync is referenced by a FutureTask 's sync variable. That FutureTask is referenced by the FutureTask$Sync 's this$0 variable. I have read around about this (here, here, and on SO) and it seems like the FutureTask that the

mybatis 源码分析（六）StatementHandler 主体结构分析

阅读更多关于 mybatis 源码分析（六）StatementHandler 主体结构分析

分析到这里的时候，mybatis 初始化、接口、事务、缓存等主要功能都已经讲完了，现在就还剩下 StatementHandler 这个真正干活的家伙没有分析了；所以接下来的博客内容主要和数据库的关系比较密切，而 StatementHandler 的主要流程也基本是和 JDBC 的流程是一一对应的；一、StatementHandler 执行流程在 mybatis 系列文章的第一篇，我放了一张 mybatis 整体的执行流程图：从上面的图中也能比较清楚的看到 StatementHandler 的职责：获取 Statement -> 设置参数 -> 查询数据库 -> 将查询结果映射为 JavaBean，从这里也能看到是和我们使用原生 JDBC 的流程是一样的；而整个过程 StatementHandler 又将其拆分成了部分： KeyGenerator：主键设置 ParameterHandler：参数设置 ResultSetHandler：结果集设置这里我们首先介绍 StatementHandler 的类结构： RoutingStatementHandler：路由处理器，这个相当于一个静态代理，根据 MappedStatement.statementType 创建对应的对处理器； SimpleStatementHandler：不需要预编译的简单处理器；

springboot2-监听spring容器初始化完成

阅读更多关于 springboot2-监听spring容器初始化完成

在使用Spring框架开发时, 有时我们需要在spring容器初始化完成后做一些操作, 那么我们可以通过自定义ApplicationListener 来实现. 自定义监听器 @Component public class MyApplicationListener implements ApplicationListener < ContextRefreshedEvent > { @Override public void onApplicationEvent ( ContextRefreshedEvent contextRefreshedEvent ) { // 获取spring 上下文 ApplicationContext applicationContext = contextRefreshedEvent . getApplicationContext ( ) ; // do your work... } 源码分析 springboot 启动应用, 执行 SpringApplication.run(String… args) 方法 run方法中执行完初始化上下文方法后, 会执行this.refreshContext方法, 刷新经过一堆方法跳转, 执行 AbstractApplicationContextl类的publishEvent(Object event,

Spark中资源与任务的关系

阅读更多关于 Spark中资源与任务的关系

在介绍Spark中的任务和资源之前先解释几个名词： Dirver Program：运行Application的main函数(用户提交的jar包中的main函数)并新建SparkContext实例的程序，称为驱动程序，通常用SparkContext代表驱动程序(任务的驱动程序)。 Cluster Manager：集群管理器是集群资源管理的外部服务。Spark上现在主要有Standalone、YARN、Mesos3种集群资源管理器。Spark自带的Standalone模式能满足绝大部分　　　　　　　　Spark计算环境中对集群资源管理的需求，基本只有在集群中运行多套计算框架时才考虑使用YARN和Mesos。通常说的Spark on YARN或者Standalone指的就是　　　　　　　　不同的集群资源管理方式(资源管理器)。 Worker Node：集群中可以运行Application代码的工作节点(计算资源)。 Executor：　　在Worker Node上为Application启动的一个工作进程，在进程中负责任务(Task)的运行，并且负责将数据存放在内存或者磁盘上，在Excutor内部通过多线程(线程池) 　　　　　　　并发处理应用程序的具体任务(在计算资源上运行的工作进程)。　　　　　　　每个Application都有各自独立的Executors

How to properly catch RuntimeExceptions from Executors?

阅读更多关于 How to properly catch RuntimeExceptions from Executors?

Say that I have the following code: ExecutorService executor = Executors.newSingleThreadExecutor(); executor.execute(myRunnable); Now, if myRunnable throws a RuntimeExcpetion , how can I catch it? One way would be to supply my own ThreadFactory implementation to newSingleThreadExecutor() and set custom uncaughtExceptionHandler s for the Thread s that come out of it. Another way would be to wrap myRunnable to a local (anonymous) Runnable that contains a try-catch -block. Maybe there are other similar workarounds too. But... somehow this feels dirty, I feel that it shouldn't be this complicated.

第4节 Spark程序：1 - 9

阅读更多关于第4节 Spark程序：1 - 9

五、 Spark角色介绍 Spark是基于内存计算的大数据并行计算框架。因为其基于内存计算，比Hadoop中MapReduce计算框架具有更高的实时性，同时保证了高效容错性和可伸缩性。从2009年诞生于AMPLab到现在已经成为Apache顶级开源项目，并成功应用于商业集群中，学习Spark就需要了解其架构。 Spark架构图如下： Spark架构使用了分布式计算中master-slave模型，master是集群中含有master进程的节点，slave是集群中含有worker进程的节点。 u Driver Program ：运⾏main函数并且新建SparkContext的程序。 u Application：基于Spark的应用程序，包含了driver程序和集群上的executor。 u Cluster Manager：指的是在集群上获取资源的外部服务。目前有三种类型（1）Standalone: spark原生的资源管理，由Master负责资源的分配（2）Apache Mesos:与hadoop MR兼容性良好的一种资源调度框架（3）Hadoop Yarn: 主要是指Yarn中的ResourceManager u Worker Node：集群中任何可以运行Application代码的节点，在Standalone模式中指的是通过slaves文件配置的Worker节点

订阅 executor