cluster-computing

Set hadoop system user for client embedded in Java webapp

你说的曾经没有我的故事 提交于 2019-11-26 22:29:05
I would like to submit MapReduce jobs from a java web application to a remote Hadoop cluster but am unable to specify which user the job should be submitted for. I would like to configure and use a system user which should be used for all MapReduce jobs. Currently I am unable to specify any user and no matter what the hadoop job runs under the username of the currently logged in user of the client system. This causes an error with the message Permission denied: user=alice, access=WRITE, inode="staging":hduser:supergroup:rwxr-xr-x ... where "alice" is the local, logged in user on the client

MPI: blocking vs non-blocking

こ雲淡風輕ζ 提交于 2019-11-26 21:51:03
I am having trouble understanding the concept of blocking communication and non-blocking communication in MPI. What are the differences between the two? What are the advantages and disadvantages? user1202136 Blocking communication is done using MPI_Send() and MPI_Recv() . These functions do not return (i.e., they block) until the communication is finished. Simplifying somewhat, this means that the buffer passed to MPI_Send() can be reused, either because MPI saved it somewhere, or because it has been received by the destination. Similarly, MPI_Recv() returns when the receive buffer has been

Spark spark-submit --jars arguments wants comma list, how to declare a directory of jars?

和自甴很熟 提交于 2019-11-26 20:58:55
In Submitting Applications in the Spark docs, as of 1.6.0 and earlier , it's not clear how to specify the --jars argument, as it's apparently not a colon-separated classpath not a directory expansion. The docs say "Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes." Question: What are all the options for submitting a classpath with --jars in the spark-submit script in $SPARK_HOME/bin? Anything undocumented that could be submitted as an

Easy way to use parallel options of scikit-learn functions on HPC

余生长醉 提交于 2019-11-26 18:55:24
问题 In many functions from scikit-learn implemented user-friendly parallelization. For example in sklearn.cross_validation.cross_val_score you just pass desired number of computational jobs in n_jobs argument. And for PC with multi-core processor it will work very nice. But if I want use such option in high performance cluster (with installed OpenMPI package and using SLURM for resource management) ? As I know sklearn uses joblib for parallelization, which uses multiprocessing . And, as I know

What's the difference between Apache's Mesos and Google's Kubernetes

坚强是说给别人听的谎言 提交于 2019-11-26 17:53:03
问题 What exactly is the difference between Apache's Mesos and Google's Kubernetes? I understand both are server cluster management software. Can anyone elaborate where the main differences are - when would which framework be preferred? Why would you want to use Kubernetes on top of Mesosphere? 回答1: Kubernetes is an open source project that brings 'Google style' cluster management capabilities to the world of virtual machines, or 'on the metal' scenarios. It works very well with modern operating

Scaling solutions for MySQL (Replication, Clustering)

余生颓废 提交于 2019-11-26 16:52:19
At the startup I'm working at we are now considering scaling solutions for our database. Things get somewhat confusing (for me at least) with MySQL, which has the MySQL cluster , replication and MySQL cluster replication (from ver. 5.1.6), which is an asynchronous version of the MySQL cluster. The MySQL manual explains some of the differences in its cluster FAQ , but it is hard to ascertain from it when to use one or the other. I would appreciate any advice from people who are familiar with the differences between those solutions and what are the pros and cons, and when do you recommend to use

How to change memory per node for apache spark worker

我只是一个虾纸丫 提交于 2019-11-26 15:53:03
问题 I am configuring an Apache Spark cluster. When I run the cluster with 1 master and 3 slaves, I see this on the master monitor page: Memory 2.0 GB (512.0 MB Used) 2.0 GB (512.0 MB Used) 6.0 GB (512.0 MB Used) I want to increase the used memory for the workers but I could not find the right config for this. I have changed spark-env.sh as below: export SPARK_WORKER_MEMORY=6g export SPARK_MEM=6g export SPARK_DAEMON_MEMORY=6g export SPARK_JAVA_OPTS="-Dspark.executor.memory=6g" export JAVA_OPTS="

PHP sessions in a load balancing cluster - how?

蹲街弑〆低调 提交于 2019-11-26 10:14:41
问题 OK, so I\'ve got this totally rare an unique scenario of a load balanced PHP website. The bummer is - it didn\'t used to be load balanced. Now we\'re starting to get issues... Currently the only issue is with PHP sessions. Naturally nobody thought of this issue at first so the PHP session configuration was left at its defaults. Thus both servers have their own little stash of session files, and woe is the user who gets the next request thrown to the other server, because that doesn\'t have

Spark spark-submit --jars arguments wants comma list, how to declare a directory of jars?

流过昼夜 提交于 2019-11-26 09:03:50
问题 In Submitting Applications in the Spark docs, as of 1.6.0 and earlier, it\'s not clear how to specify the --jars argument, as it\'s apparently not a colon-separated classpath not a directory expansion. The docs say \"Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes.\" Question: What are all the options for submitting a classpath with -

Set hadoop system user for client embedded in Java webapp

混江龙づ霸主 提交于 2019-11-26 08:21:05
问题 I would like to submit MapReduce jobs from a java web application to a remote Hadoop cluster but am unable to specify which user the job should be submitted for. I would like to configure and use a system user which should be used for all MapReduce jobs. Currently I am unable to specify any user and no matter what the hadoop job runs under the username of the currently logged in user of the client system. This causes an error with the message Permission denied: user=alice, access=WRITE, inode