cluster-computing | 易学教程

Application not working on a Clustered Environment?

阅读更多关于 Application not working on a Clustered Environment?

问题 I am working on a Java Application that is responsible for getting data from the request from a service. The request is in the form of an xml. My Java class takes the value from the xml and stores them in the database. The Java class is also responsible for logging the data into simple text(.txt) files. I have acheived that using String Builder. Another info is that the class recieves requests in the form of messages. I have used JMS to achieve that. Ok, that being said, my class is working

Aerospike Timeout Exception

阅读更多关于 Aerospike Timeout Exception

问题 I have an application that users a lot of batch reads in Aerospike. After some time the application is running, I get a lot these errors: play.api.Application$$anon$1: Execution exception[[AerospikeException: Error Code 9: Timeout]] at play.api.Application$class.handleError(Application.scala:296) ~[com.typesafe.play.play_2.11-2.3.7.jar:2.3.7] at play.api.DefaultApplication.handleError(Application.scala:402) [com.typesafe.play.play_2.11-2.3.7.jar:2.3.7] at play.core.server.netty

File can't be found in a small fraction of submitted jobs

阅读更多关于 File can't be found in a small fraction of submitted jobs

问题 I'm trying to run a very large set of batch jobs on a RHEL5 cluster which uses a Lustre file system. I was getting a strange error with roughly 1% of the jobs: they could't find a text file they are all using for steering. A script that reproduces the error looks like this: #!/usr/bin/env bash #PBS -t 1-18792 #PBS -l mem=4gb,walltime=30:00 #PBS -l nodes=1:ppn=1 #PBS -q hep #PBS -o output/fit/out.txt #PBS -e output/fit/error.txt cd $PBS_O_WORKDIR mkdir -p output/fit echo 'submitted from: '

Is Erlang the C of the clustered computing world?

阅读更多关于 Is Erlang the C of the clustered computing world?

问题 Erlang seems to be very low level and performant on networks, but does not have a very rich type system or many of the things that other functional languages offer, so it seems to me that it will become the lowest level development language for clustered programming, until something else comes along and offers a decent clustered VM AND high level constructs. Any thoughts on this? 回答1: C is the C of clustered computing. At least, every HPC cluster I've seen had lots of C and Fortran running

Running multiple worker daemons SLURM

阅读更多关于 Running multiple worker daemons SLURM

问题 I want to run multiple worker daemons on single machine. As per damienfrancois's answer on what is the minimum number of computers for a slurm cluster it can be done. Problem is currently I am able to execute only 1 worker daemon on one machine. for example When I run sudo slurmd -N linux1 -cDvv sudo slurmd -N linux2 -cDvv linux1 goes down when I run linux2. Is it possible to run multiple worker daemons on one machine? Here is my slurm.conf file 回答1: as your intention seems to be just testing

Spark 2.2 Join fails with huge dataset

阅读更多关于 Spark 2.2 Join fails with huge dataset

问题 I am currently facing issues when trying to join (inner) a huge dataset (654 GB) with a smaller one (535 MB) using Spark DataFrame API . I am broadcasting the smaller dataset to the worker nodes using the broadcast() function. I am unable to do the join between those two datasets. Here is a sample of the errors I got : 19/04/26 19:39:07 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 1315 19/04/26 19:39:07 INFO executor.Executor: Running task 25.1 in stage 13.0 (TID 1315) 19/04

Running R in Batch Mode on Linux: Output Issues

阅读更多关于 Running R in Batch Mode on Linux: Output Issues

问题 I'm running an R program on a linux cluster because it is very demanding on my processor. My program is designed to output multiple (around 15) plots as PDF's into the folder from which the program gathers its input. I want my program to run in the background, and to continue running when I log out of the cluster. First, I tried this: cd /Users/The/Folder/With/My/RScript #changed working directory nohup ./BatchProgram.R & However, this didn't work because it appended the output to a file

How fast can one submit consecutive and independent jobs with qsub?

阅读更多关于 How fast can one submit consecutive and independent jobs with qsub?

问题 This question is related to pbs job no output when busy. i.e Some of the jobs I submit produce no output when PBS/Torque is 'busy'. I imagine that it is busier when many jobs are being submitted one after another, and as it so happens, of the jobs submitted in this fashion, I often get some that do not produce any output. Here're some codes. Suppose I have a python script called "x_analyse.py" that takes as its input a file containing some data, and analyses the data stored in the file: ./x

What are the limitations of implementing MySQL NDB Cluster?

阅读更多关于 What are the limitations of implementing MySQL NDB Cluster?

问题 I want to implement NDB Cluster for MySQL Cluster 6. I want to do it for very huge data structure with minimum 2 million records. I want to know is if there are any limitations of implementing NDB cluster. For example, RAM size, number of databases, or size of database for NDB cluster. 回答1: 2 million databases? I asssume you meant "rows". Anyway, concerning limitations: one of the most important things to keep in mind is that NDB/MySQL Cluster is not a general purpose database. Most notably,

Spread vs MPI vs zeromq?

阅读更多关于 Spread vs MPI vs zeromq?

问题 In one of the answers to Broadcast like UDP with the Reliability of TCP, a user mentions the Spread messaging API. I've also run across one called ØMQ. I also have some familiarity with MPI. So, my main question is: why would I choose one over the other? More specifically, why would I choose to use Spread or ØMQ when there are mature implementations of MPI to be had? 回答1: MPI was deisgned tightly-coupled compute clusters with fast, reliable networks. Spread and ØMQ are designed for large