mesos

Spark on Mesos (DC/OS) loses tasks before doing anything

戏子无情 提交于 2019-12-08 05:44:22
问题 I'm experiencing serious problem with DC/OS Apache Spark running in Mesos cluster mode. I'm using GCE instances for DC/OS for testing. Everything was working fine with spark, Spark just suddenly stopped working properly. Before it starts executors it says that some executors are lost and it blacklists nodes leaving very little resources so it runs ~2-3 executors. Then it proceeds to remove non-existing executors but I don't know if it succeeds. I wanted to disable blacklisting but it's

Spark on Mesos (DC/OS) loses tasks before doing anything

坚强是说给别人听的谎言 提交于 2019-12-08 04:31:27
I'm experiencing serious problem with DC/OS Apache Spark running in Mesos cluster mode. I'm using GCE instances for DC/OS for testing. Everything was working fine with spark, Spark just suddenly stopped working properly. Before it starts executors it says that some executors are lost and it blacklists nodes leaving very little resources so it runs ~2-3 executors. Then it proceeds to remove non-existing executors but I don't know if it succeeds. I wanted to disable blacklisting but it's hardcoded in Spark mesos files and impossible to disable or change any configuration. This happens mainly

Docker : /var/run/docker.sock: no such file or directory

回眸只為那壹抹淺笑 提交于 2019-12-08 04:18:28
问题 Infra I am trying to set up a mesos<->marathon cluster distributing containers. On my main server I have: zookeeper docker registry v2 (port 5000) wo credentials a container with supervisord + mesos + marathon Furthermore I have a slave (on the same server). $docker ps 192.168.0.38:5000/mesos-slave:prod mesos-slave-1 192.168.0.38:5000/mesos-master:generic mesos-master jplock/zookeeper 0.0.0.0:2181->2181/tcp, 0.0.0.0:2888->2888/tcp, 0.0.0.0:3888->3888/tcp nostalgic_visvesvaraya registry:2 0.0

Consul deregister 'failing' services

∥☆過路亽.° 提交于 2019-12-08 02:23:40
问题 I have consul running on Consul v0.5.2 version & services running in Mesos. Services keep moving from 1 server to another. Is there way to deregister services in consul that are in 'failing' state? I am able to get the list of services in failing state using this curl curl http://localhost:8500/v1/health/state/critical Issue that we are seeing is over a period of time in consul UI we have stale data & making the whole UI unusable 回答1: Consul by default do not deregister unhealthy services

Spark集群三种部署模式的区别

强颜欢笑 提交于 2019-12-07 19:48:52
Spark最主要资源管理方式按排名为Hadoop Yarn, Apache Standalone 和Mesos。在单机使用时,Spark还可以采用最基本的local模式。 目前Apache Spark支持三种分布式部署方式,分别是standalone、spark on mesos和 spark on YARN,其中,第一种类似于MapReduce 1.0所采用的模式,内部实现了容错性和资源管理,后两种则是未来发展的趋势,部分容错性和资源管理交由统一的资源管理系统完成:让Spark运行在一个通用的资源管理系统之上,这样可以与其他计算框架,比如MapReduce,公用一个集群资源,最大的好处是降低运维成本和提高资源利用率(资源按需分配)。本文将介绍这三种部署方式,并比较其优缺点。 1. Standalone模式 即独立模式,自带完整的服务,可单独部署到一个集群中,无需依赖任何其他资源管理系统。从一定程度上说,该模式是其他两种的基础。借鉴Spark开发模式,我们可以得到一种开发新型计算框架的一般思路:先设计出它的standalone模式,为了快速开发,起初不需要考虑服务(比如master/slave)的容错性,之后再开发相应的wrapper,将stanlone模式下的服务原封不动的部署到资源管理系统yarn或者mesos上,由资源管理系统负责服务本身的容错

Mesos DCOS doesn't install Kafka

拟墨画扇 提交于 2019-12-07 17:11:54
问题 I'm trying to install Kafka on Mesos. Installation seems to have succeeded. vagrant@DevNode:/dcos$ dcos package install kafka This will install Apache Kafka DCOS Service. Continue installing? [yes/no] yes Installing Marathon app for package [kafka] version [0.9.4.0] Installing CLI subcommand for package [kafka] version [0.9.4.0] New command available: dcos kafka The Apache Kafka DCOS Service is installed: docs - https://github.com/mesos/kafka issues - https://github.com/mesos/kafka/issues

Spark Mesos Dispatcher

折月煮酒 提交于 2019-12-07 11:54:27
My team is deploying a new Big Data architecture on Amazon Cloud. We have Mesos up and running Spark jobs. We are submitting Spark jobs (i.e.: jars) from a bastion host inside the same cluster. Doing so, however, the bastion host is the driver program and this is called the client mode (if I understood correctly). We would like to try the cluster mode , but we don't understand where to start the dispatcher process. The documentation says to start it in the cluster, but I'm confused since our masters don't have Spark installed and we use Zookeeper for master election. Starting it on a slave

how to auto launch new task instance when mesos-slave stopped?

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-07 06:07:25
version Info and command line args mesos-master & mesos-slave version 1.1.0 marathon version 1.4.3 docker server version 1.28 mesos-master's command line args: --zk=zk://ip1:2181,ip2:2181,ip3:2181/mesos \ --port=5050 \ --log_dir=/var/log/mesos \ --hostname=ip1 \ --quorum=2 \ --work_dir=/var/lib/mesosmaster mesos-slave's command line args: --master=zk://ip1:2181,ip2:2181,ip3:2181/mesos \ --log_dir=/var/log/mesos --containerizers=docker,mesos \ --executor_registration_timeout=10mins --hostname=ip1 \ --recovery_timeout=1mins \ --resources=ports:[25000-65000] \ --work_dir=/var/lib/mesos operation

How to use volumes-from in marathon

天大地大妈咪最大 提交于 2019-12-06 21:53:43
问题 I'm working with mesos + marathon + docker quite a while but I got stuck at some point. At the moment I try to deal with persistent container and I tried to play around with the "volumes-from" parameter but I can't make it work because I have no clue how I can figure out the name of the data box to put it as a key in the json. I tried it with the example from here { "id": "privileged-job", "container": { "docker": { "image": "mesosphere/inky" "privileged": true, "parameters": [ { "key":

Mesos cannot deploy container from private Docker registry

Deadly 提交于 2019-12-06 15:24:21
I have a private Docker registry that is accessible at https://docker.somedomain.com (over standard port 443 not 5000). My infrastructure includes a set up of Mesosphere, which have docker containerizer enabled. I'm am trying to deploy a specific container to a Mesos slave via Marathon; however, this always fails with Mesos failing the task almost immediately with no data in stderr and stdout of that sandbox. I tried deploying from an image from the standard Docker Registry and it appears to work fine. I'm having trouble figuring out what is wrong. My private Docker registry does not require