executor

Spark Streaming: Micro batches Parallel Execution

匿名 (未验证) 提交于 2019-12-03 08:54:24
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: We are receiving data in spark streaming from Kafka. Once execution has been started in Spark Streaming, it executes only one batch and the remaining batches starting queuing up in Kafka. Our data is independent and can be processes in Parallel. We tried multiple configurations with multiple executor, cores, back pressure and other configurations but nothing worked so far. There are a lot messages queued and only one micro batch has been processed at a time and rest are remained in queue. We want to achieve parallelism at maximum, so that

AttributeError: 'module' object has no attribute 'merge_all_summaries'

匿名 (未验证) 提交于 2019-12-03 08:50:26
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: Ubuntu 14.04. Python 2.7.13 :: Anaconda custom (64-bit) I installed Tensorflow follow the tutorial: https://www.tensorflow.org/install/ when I enter ~/anaconda2/lib/python2.7/site-packages/tensorflow/examples/tutorials/mnist and attempt to run the already existed python file: fully_connected_feed.py I met the below AttributeError : :~/anaconda2/lib/python2.7/site-packages/tensorflow/examples/tutorials/mnist$ python fully_connected_feed.py I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0

Spark: executor memory exceeds physical limit

匿名 (未验证) 提交于 2019-12-03 08:30:34
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: My input dataset is about 150G. I am setting --conf spark.cores.max=100 --conf spark.executor.instances=20 --conf spark.executor.memory=8G --conf spark.executor.cores=5 --conf spark.driver.memory=4G but since data is not evenly distributed across executors, I kept getting Container killed by YARN for exceeding memory limits. 9.0 GB of 9 GB physical memory used here are my questions: 1. Did I not set up enough memory in the first place? I think 20 * 8G > 150G, but it's hard to make perfect distribution, so some executors will suffer 2. I

spark executor memory cut to 1/2

匿名 (未验证) 提交于 2019-12-03 08:28:06
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I am doing a spark-submit like this spark-submit --class com.mine.myclass --master yarn-cluster --num-executors 3 --executor-memory 4G spark-examples_2.10-1.0.jar in the web ui, I can see indeed there are 3 executor nodes, but each has 2G of memory. When I set --executor-memory 2G, then ui shows 1G per node. How did it figure to reduce my setting by 1/2? 回答1: The executor page of the Web UI is showing the amount of storage memory, which is equal to 54% of Java heap by default (spark.storage.safetyFraction 0.9 * spark.storage.memoryFraction 0

Correct way to stop custom logback async appender

匿名 (未验证) 提交于 2019-12-03 07:36:14
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I've created Amazon SQS and SNS logback appenders using the Amazon's Java SDK. The basic appenders use the synchronous Java APIs, but I've also created asynchronous versions of both by extending the ch.qos.logback.classic.AsyncAppender class. Stopping the logback logger context with the async appenders does not work as expected though. When the context is stopped, all async appenders try to to flush remaining events before exiting. The problem originates from ch.qos.logback.core.AsyncAppenderBase#stop method, which interrupts the worker

Unhandled exceptions with Java scheduled executors

亡梦爱人 提交于 2019-12-03 04:46:29
问题 I have the following issue and I would like to know what exactly happens. I am using Java's ScheduledExecutorService to run a task every five minutes. It works very well. Executors completely changed the way I do thread programming in Java. Now, I browsed Java Doc for information about what would be the behavior in case the scheduled task fails with an unhandled exception, but couldn't find anything. Is the next scheduled task still going to run? If there is an unhandled exception, the

JUC - Monitor监控ThreadPoolExecutor

微笑、不失礼 提交于 2019-12-03 04:30:28
JUC - Monitor监控ThreadPoolExecutor 一个自定义 Monitor 监控ThreadPoolExecutor的执行情况 TASK WokerTask class WorkerTask implements Runnable{ private String command; public WorkerTask(String command) { this.command = command; } @Override public void run() { System.out.println(Thread.currentThread().getName()+" Start. Command = "+command); processCommand(); System.out.println(Thread.currentThread().getName()+" End."); } private void processCommand(){ try { TimeUnit.SECONDS.sleep(5); } catch (InterruptedException e) { e.printStackTrace(); } } @Override public String toString() { return "WorkerTask{" + "command

carbondata使用总结

隐身守侯 提交于 2019-12-03 04:19:13
CarbonData简介 CarbonData是一种新型的Apache Hadoop本地文件格式,使用先进的列式存储、索引、压缩和编码技术,以提高计算效率,有助于加速超过PB数量级的数据查询,可用于更快的交互查询。同时,CarbonData也是一种将数据源与Spark集成的高性能分析引擎。 图1 CarbonData基本架构 使用CarbonData的目的是对大数据即席查询提供超快速响应。从根本上说,CarbonData是一个OLAP引擎,采用类似于RDBMS中的表来存储数据。用户可将大量(10TB以上)的数据导入以CarbonData格式创建的表中,CarbonData将以压缩的多维索引列格式自动组织和存储数据。数据被加载到CarbonData后,就可以执行即席查询,CarbonData将对数据查询提供秒级响应。 CarbonData将数据源集成到Spark生态系统,用户可使用Spark SQL执行数据查询和分析。也可以使用Spark提供的第三方工具JDBCServer连接到Spark SQL。 CarbonData结构 CarbonData作为Spark内部数据源运行,不需要额外启动集群节点中的其他进程,CarbonData Engine在Spark Executor进程之中运行。 图2 CarbonData结构 存储在CarbonData

Spark application kills executor

匿名 (未验证) 提交于 2019-12-03 03:06:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I'm running spark cluster in standalone mode and application using spark-submit. In spark UI stage section I found executing stage with large execution time ( > 10h, when usual time ~30 sec). Stage have many failed tasks with error Resubmitted (resubmitted due to lost executor) . There is executor with address CANNOT FIND ADDRESS in Aggregated Metrics by Executor section in the stage page. Spark tries to resubmit this task infinitely. If I kill this stage (my application rerun uncompleted spark jobs automatically) all continue working good.

Await Future from Executor: Future can't be used in 'await' expression

匿名 (未验证) 提交于 2019-12-03 03:06:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I wanted to use a ThreadPoolExecutor from a python coroutine , to delegate some blocking network calls to a separate thread. However, running the following code: from concurrent.futures import ThreadPoolExecutor import asyncio def work(): # do some blocking io pass async def main(): executor = ThreadPoolExecutor() await executor.submit(work) loop = asyncio.get_event_loop() loop.run_until_complete(main()) loop.close() causes error: TypeError: object Future can't be used in 'await' expression Aren't Future objects awaitable ? Why does it say