Executors

How to avoid Spark executor from getting lost and yarn container killing it due to memory limit?

大城市里の小女人 提交于 2019-11-28 19:34:09
I have the following code which fires hiveContext.sql() most of the time. My task is I want to create few tables and insert values into after processing for all hive table partition. So I first fire show partitions and using its output in a for-loop, I call a few methods which creates the table (if it doesn't exist) and inserts into them using hiveContext.sql . Now, we can't execute hiveContext in an executor, so I have to execute this in a for-loop in a driver program, and should run serially one by one. When I submit this Spark job in YARN cluster, almost all the time my executor gets lost

Java Executor with throttling/throughput control

你说的曾经没有我的故事 提交于 2019-11-28 17:53:38
I'm looking for a Java Executor that allows me to specify throttling/throughput/pacing limitations, for example, no more than say 100 tasks can be processed in a second -- if more tasks get submitted they should get queued and executed later. The main purpose of this is to avoid running into limits when hitting foreign APIs or servers. I'm wondering whether either base Java (which I doubt, because I checked) or somewhere else reliable (e.g. Apache Commons) provides this, or if I have to write my own. Preferably something lightweight. I don't mind writing it myself, but if there's a "standard"

How to get thread id from a thread pool?

放肆的年华 提交于 2019-11-28 14:22:44
问题 I have a fixed thread pool that I submit tasks to (limited to 5 threads). How can I find out which one of those 5 threads executes my task (something like "thread #3 of 5 is doing this task")? ExecutorService taskExecutor = Executors.newFixedThreadPool(5); //in infinite loop: taskExecutor.execute(new MyTask()); .... private class MyTask implements Runnable { public void run() { logger.debug("Thread # XXX is doing this task");//how to get thread id? } } 回答1: Using Thread.currentThread() :

ScheduledExecutorService, how to stop action without stopping executor?

帅比萌擦擦* 提交于 2019-11-28 12:09:43
I have this code: ScheduledExecutorService scheduledExecutor; ..... ScheduledFuture<?> result = scheduledExecutor.scheduleWithFixedDelay( new SomethingDoer(),0, measurmentPeriodMillis, TimeUnit.MILLISECONDS); After some event I should stop action, which Declared in run() method of the SomethingDoer , which implements Runnable . How can I do this? I can't shutdown executor, I should only revoke my periodic task. Can I use result.get() for this? And if I can, please tell me how it will work. Use result.cancel() . The ScheduledFuture is the handle for your task. You need to cancel this task and

Spark - How many Executors and Cores are allocated to my spark job

送分小仙女□ 提交于 2019-11-28 01:30:43
问题 Spark architecture is entirely revolves around the concept of executors and cores. I would like to see practically how many executors and cores running for my spark application running in a cluster. I was trying to use below snippet in my application but no luck. val conf = new SparkConf().setAppName("ExecutorTestJob") val sc = new SparkContext(conf) conf.get("spark.executor.instances") conf.get("spark.executor.cores") Is there any way to get those values using SparkContext Object or

How to avoid Spark executor from getting lost and yarn container killing it due to memory limit?

百般思念 提交于 2019-11-27 12:35:32
问题 I have the following code which fires hiveContext.sql() most of the time. My task is I want to create few tables and insert values into after processing for all hive table partition. So I first fire show partitions and using its output in a for-loop, I call a few methods which creates the table (if it doesn't exist) and inserts into them using hiveContext.sql . Now, we can't execute hiveContext in an executor, so I have to execute this in a for-loop in a driver program, and should run

线程池不允许使用Executors去创建,而是通过ThreadPoolExecutor的方式,这样的处理方式让写的同学更加明确线程池的运行规则,规避资源耗尽的风险。

送分小仙女□ 提交于 2019-11-27 05:41:51
前言:jdk1.7中java.util.concurrent.Executor线程池体系介绍 java.util.concurrent.Executor : 负责线程的使用与调度的根接口 |–ExecutorService:Executor的子接口,线程池的主要接口 |–ThreadPoolExecutor:ExecutorService的实现类  |–ScheduledExecutorService:ExecutorService的子接口,负责线程的调度 |–ScheduledThreadPoolExecutor:继承了ThreadPoolExecutor实现了ScheduledExecutorService 一、Executors的四种线程池 newCachedThreadPool 创建一个可缓存线程池,如果线程池长度超过处理需要,可灵活回收空闲线程,若无可回收,则新建线程。线程池为无限大,当执行第二个任务时第一个任务已经完成,会复用执行第一个任务的线程,而不用每次新建线程。 创建方式: Executors.newCachedThreadPool(); newFixedThreadPool 创建一个定长线程池,可控制线程最大并发数,超出的线程会在队列中等待。定长线程池的大小最好根据系统资源进行设置,如Runtime.getRuntime()