cluster-computing

Failing K8s rabbitmq-peer-discovery-k8s clustering

断了今生、忘了曾经 提交于 2019-12-24 00:58:40
问题 I'm trying to bring up a RabbitMQ cluster on Kubernetes using Rabbitmq-peer-discovery-k8s plugin and I always have only on pod running and ready but the next one always fails. I tried multiple changes to my configuration and this is what got at least one pod running --- apiVersion: v1 kind: ServiceAccount metadata: name: rabbitmq namespace: namespace-dev --- kind: Role apiVersion: rbac.authorization.k8s.io/v1beta1 metadata: name: endpoint-reader namespace: namespace-dev rules: - apiGroups: ["

Fortran recursion segmentation faults

江枫思渺然 提交于 2019-12-23 19:59:21
问题 I have to design and implement a Fortran routine to determine the size of clusters on a square lattice, and it seemed extremely convenient to code the subroutine recursively. However, whenever my lattice size grows beyond a certain value (around 200/side), the subroutine consistently segfaults. Here's my cluster-detection routine: RECURSIVE SUBROUTINE growCluster(lattice, adj, idx, area) INTEGER, INTENT(INOUT) :: lattice(:), area INTEGER, INTENT(IN) :: adj(:,:), idx lattice(idx) = -1 area =

Tomcat 6 cluster with shared objects

南笙酒味 提交于 2019-12-23 19:19:08
问题 We have a large cluster of tomcat servers and I'm trying to find an efficient way to share a count among all of them. This count is the number of "widgets" purchased and needs to be checked for every page view. Any server can complete a sale and increment that count, at which point the new value should be made available to all the cluster members. We don't want to use the count from the database because there will be many page views between updates across the cluster and a get operation to

How to access to GPUs on different nodes in a cluster with Slurm?

徘徊边缘 提交于 2019-12-23 18:12:20
问题 I have access to a cluster that's run by Slurm, in which each node has 4 GPUs. I have a code that needs 8 gpus. So the question is how can I request 8 gpus on a cluster that each node has only 4 gpus? So this is the job that I tried to submit via sbatch : #!/bin/bash #SBATCH --gres=gpu:8 #SBATCH --nodes=2 #SBATCH --mem=16000M #SBATCH --time=0-01:00 But then I get the following error: sbatch: error: Batch job submission failed: Requested node configuration is not available Then I changed my

Tomcat Clustering in Microsoft Azure

强颜欢笑 提交于 2019-12-23 12:03:27
问题 Is there any chance to cluster Tomcat in Microsoft Azure? I know that it is possible to run Tomcat with the use of Tomcat Solution Accelerator. Since the normal Tomcat clustering is based on multicasts, it can not be used in Microsoft Azure. Is there another option? Thanks in advance for reading and answering my question. Every comment/idea is highly appreciated 回答1: One option would be to use memcached-session-manager: http://code.google.com/p/memcached-session-manager/ It's a tomcat session

Why are “sc.addFile” and “spark-submit --files” not distributing a local file to all workers?

最后都变了- 提交于 2019-12-23 10:09:08
问题 I have a CSV file "test.csv" that I'm trying to have copied to all nodes on the cluster. I have a 4 node apache-spark 1.5.2 standalone cluster. There are 4 workers where one node also acts has master/driver as well as the worker. If I run: $SPARK_HOME/bin/pyspark --files=./test.csv OR from within the REPL interface execute sc.addFile('file://' + '/local/path/to/test.csv') I see spark log the following: 16/05/05 15:26:08 INFO Utils: Copying /local/path/to/test.csv to /tmp/spark-5dd7fc83-a3ef

Setup Cassandra Cluster AWS

一个人想着一个人 提交于 2019-12-23 05:33:07
问题 I want to setup a Cassandra 2.* Cluster composed of 3 (multiple nodes) nodes in AWS. What are the official steps for doing this? base image to use, ports to open, config files, etc. PS: Pretty much everything I found points me to DataStax site but I don't think it is free if later we decide to setup this in production. Thanks! 回答1: So, I found these links which were enough to get it working. Install Cassandra on a Single Node Amazon Linux http://www.jonathanhui.com/install-cassandra-single

MPI cluster based parallel calculation in R on WestGrid (pbs file)

狂风中的少年 提交于 2019-12-23 05:32:26
问题 I am now dealing with a large dataset and I want to use parallel calculation to accelerate the process. WestGird is a Canadian computing system which has clusters with interconnect. I use two packages doSNOW and parallel to do parallel jobs. My question is how I should write the pbs file. When I submit the job using qsub , an error occurs: mpirun noticed that the job aborted, but has no info as to the process that caused that situation . Here is the R script code: install.packages("fume_1.0

Hadoop multinode cluster too slow. How do I increase speed of data processing?

旧城冷巷雨未停 提交于 2019-12-23 04:53:40
问题 I have a 6 node cluster - 5 DN and 1 NN. All have 32 GB RAM. All slaves have 8.7 TB HDD. DN has 1.1 TB HDD. Here is the link to my core-site.xml , hdfs-site.xml , yarn-site.xml. After running an MR job, i checked my RAM Usage which is mentioned below: Namenode free -g total used free shared buff/cache available Mem: 31 7 15 0 8 22 Swap: 31 0 31 Datanode : Slave1 : free -g total used free shared buff/cache available Mem: 31 6 6 0 18 24 Swap: 31 3 28 Slave2: total used free shared buff/cache

Hadoop: How can I prevent failed tasks from making the whole job fail?

余生长醉 提交于 2019-12-23 04:24:51
问题 I'm running a hadoop job with, say, 1000 tasks. I need the job to attempt to run every task but many of the tasks will not complete and will instead throw an exception. I cannot change this behavior, but I still need the data obtained from the tasks that did not fail. How can I make sure Hadoop goes through with all the 1000 tasks despite encountering a large number of failed tasks? 回答1: In your case, you could set the maximum percentage of tasks that are allowed to fail without triggering