executor | 易学教程

storm 原理简介及单机版安装指南

阅读更多关于 storm 原理简介及单机版安装指南

本文翻译自： https://github.com/nathanmarz/storm/wiki/Tutorial Storm是一个分布式的、高容错的实时计算系统。 Storm对于实时计算的的意义相当于Hadoop对于批处理的意义。Hadoop为我们提供了Map和Reduce原语，使我们对数据进行批处理变的非常的简单和优美。同样，Storm也对数据的实时计算提供了简单Spout和Bolt原语。 Storm适用的场景： 1、流数据处理：Storm可以用来用来处理源源不断的消息，并将处理之后的结果保存到持久化介质中。 2、分布式RPC：由于Storm的处理组件都是分布式的，而且处理延迟都极低，所以可以Storm可以做为一个通用的分布式RPC框架来使用。在这个教程里面我们将学习如何创建Topologies, 并且把topologies部署到storm的集群里面去。Java将是我们主要的示范语言，个别例子会使用python以演示storm的多语言特性。 1、准备工作这个教程使用 storm-starter 项目里面的例子。我推荐你们下载这个项目的代码并且跟着教程一起做。先读一下：配置storm开发环境和新建一个strom项目这两篇文章把你的机器设置好。 2、一个Storm集群的基本组件 storm的集群表面上看和hadoop的集群非常像

Executor线程池原理详解

阅读更多关于 Executor线程池原理详解

线程池线程池的目的就是减少多线程创建的开销，减少资源的消耗，让系统更加的稳定。在web开发中，服务器会为了一个请求分配一个线程来处理，如果每次请求都创建一个线程，请求结束就销毁这个线程。那么在高并发的情况下，就会有大量线程创建和销毁，这就会降低系统的效率。线程池的诞生就是为了让线程得到重复使用，减少了线程创建和销毁的开销，减少了线程的创建和销毁自然的就提高了系统的响应速度，与此同时还提高了线程的管理性，使线程可以得到统一的分配，监控和调优。线程创建和销毁为什么会有开销呢，因为我们java运行的线程是依赖于计算机内核的核心线程的。java创建的线程是用户层的线程，要依赖于线程调度去是用内核层的线程来执行，在执行销毁的时候会通过TSS在用户层和核心层的切换，这个切换就是很大的一笔开销。具体结构如下图：线程实现方式线程主要通过实现Runnable或者Callable接口来实现.Runnable与Callable的区别在于后者有返回值，但是前者没有返回值。 public interface Runnable { public abstract void run(); } publuic interface Callable<V>{ V call() throws Exception; } 下面我们来看一下测试代码： package com.test.excutor; import

使用分布式任务调度平台xxl-job

阅读更多关于使用分布式任务调度平台xxl-job

一.下载源码 https://github.com/xuxueli/xxl-job/releases 二.准备环境在MySQL5.6+上执行源码目录下\doc\db\tables_xxl_job.sql脚本(脚本中初始化了一个账户，一个执行器) 使用idea导入源码目录下\xxl-job-admin的maven项目修改\xxl-job-admin(下面简称admin)项目application.properties中数据库配置，改为自己本地 admin是基于Springboot，按照Springboot方式即可启动。根据application.properties的配置，默认访问地址为127.0.0.1:8080/xxl-job-admin，默认账目密码为admin/123456 三.接入调度任务 xxl-job采用调度与任务分离的模式，分为两块：调度中心(admin),任务(executor).上面启动的即为admin,admin负责调度任务，控制任务的触发时机以及路由策略。executor负责具体要执行的业务，即任务代码写在executor中。而工作中一般业务写在service层，所以可以将executor接入到service层，然后将任务代码写到executor。下例中使用无框架的方式演示接入executor。 1.新建一个简单maven项目

ExecutorService slow multi thread performance

阅读更多关于 ExecutorService slow multi thread performance

I am trying to execute a simple calculation (it calls Math.random() 10000000 times). Surprisingly running it in simple method performs much faster than using ExecutorService. I have read another thread at ExecutorService's surprising performance break-even point --- rules of thumb? and tried to follow the answer by executing the Callable using batches, but the performance is still bad How do I improve the performance based on my current code? import java.util.*; import java.util.concurrent.*; public class MainTest { public static void main(String[]args) throws Exception { new MainTest().start(

Apache Airflow 1.10.3: Executor reports task instance ??? finished (failed) although the task says its queued. Was the task killed externally?

阅读更多关于 Apache Airflow 1.10.3: Executor reports task instance ??? finished (failed) although the task says its queued. Was the task killed externally?

问题 An Airflow ETL dag has the error every day Our airflow installation is using CeleryExecutor. The concurrency configs were # The amount of parallelism as a setting to the executor. This defines # the max number of task instances that should run simultaneously # on this airflow installation parallelism = 32 # The number of task instances allowed to run concurrently by the scheduler dag_concurrency = 16 # Are DAGs paused by default at creation dags_are_paused_at_creation = True # When not using

【JUC】5.线程池—Executor

阅读更多关于【JUC】5.线程池—Executor

创建线程池可以分为三种方式： 1. 通过ThreadPoolExecutor的构造方法，创建ThreadPoolExecutor的对象，即一个线程池对象；此构造方法，一共7个参数，5个必须参数，2个带有默认值的参数；详细后面说；传送： https://www.cnblogs.com/mussessein/p/11654022.html 2. 通过Executors返回的线程池对象；这种方法创建的常用线程池为4种，还可以创建ForkJoinPool对象；可以说是封装好的方法，通过Executors的4种常用静态方法，返回4种已经封装好的ThreadPoolExecutor线程池对象；传送： https://www.cnblogs.com/mussessein/p/11654120.html 3. ForkJoinPool并发框架将一个大任务拆分成多个小任务后，使用 fork 可以将小任务分发给其他线程同时处理，使用 join 可以将多个线程处理的结果进行汇总；这实际上就是分治思想。 Executor类（尽量不使用此方法，而是使用ThreadPoolExecutor类） Executors 返回的线程池对象的弊端如下： FixedThreadPool 和 SingleThreadPool : 允许的请求队列长度为 Integer.MAX_VALUE ，可能会堆积大量的请求

ExecutorService slow multi thread performance

阅读更多关于 ExecutorService slow multi thread performance

问题 I am trying to execute a simple calculation (it calls Math.random() 10000000 times). Surprisingly running it in simple method performs much faster than using ExecutorService. I have read another thread at ExecutorService's surprising performance break-even point --- rules of thumb? and tried to follow the answer by executing the Callable using batches, but the performance is still bad How do I improve the performance based on my current code? import java.util.*; import java.util.concurrent.*;

java CompletionService ExecutorCompletionSerivce

阅读更多关于 java CompletionService ExecutorCompletionSerivce

我们来想一个问题：如果向Executor提交了一组计算任务，并且希望在计算完成后获得结果，那么我们可以保留与每个任务关联的Future，然后反复使用get方法，从而通过轮询来拿到返回结果，但是这样有些繁琐。废话不说，上代码。 1 package com.citi.test.mutiplethread.demo0511; 2 3 import java.util.ArrayList; 4 import java.util.List; 5 import java.util.concurrent.Callable; 6 import java.util.concurrent.ExecutionException; 7 import java.util.concurrent.ExecutorService; 8 import java.util.concurrent.Executors; 9 import java.util.concurrent.Future; 10 11 public class TestCompletionService { 12 public static void main(String[] args) { 13 ExecutorService service=Executors.newFixedThreadPool(10); 14 List<Future

Spark配置参数详解

阅读更多关于 Spark配置参数详解

以下是整理的Spark中的一些配置参数，官方文档请参考 Spark Configuration 。 Spark提供三个位置用来配置系统： Spark属性：控制大部分的应用程序参数，可以用SparkConf对象或者Java系统属性设置环境变量：可以通过每个节点的 conf/spark-env.sh 脚本设置。例如IP地址、端口等信息日志配置：可以通过log4j.properties配置 Spark属性 Spark属性控制大部分的应用程序设置，并且为每个应用程序分别配置它。这些属性可以直接在 SparkConf 上配置，然后传递给 SparkContext 。 SparkConf 允许你配置一些通用的属性（如master URL、应用程序名称等等）以及通过 set() 方法设置的任意键值对。例如，我们可以用如下方式创建一个拥有两个线程的应用程序。 [plain] view plain copy val conf = new SparkConf() .setMaster("local[2]") .setAppName("CountingSheep") .set("spark.executor.memory", "1g") val sc = new SparkContext(conf) 动态加载Spark属性在一些情况下，你可能想在 SparkConf 中避免硬编码确定的配置。例如

A good way to bulk download images over http with Java

阅读更多关于 A good way to bulk download images over http with Java

We have a web application that needs to import 10-20 images from a partner site via http. If I have a list of strings that represent the urls I want to download does anybody have a suggestion for how to download them as fast as possible? I could just put them in a for loop but if there is a simple way to parallelize this it would be probably be good for the end user. I would like to avoid using straight Java threads, although the executor framework might be a good idea. Any ideas? The Executor framework is EXACTLY what you want. Specifically the ExecutorCompletionService. Using this, you'll be

订阅 executor