cluster-computing

hadoop cluster is using only master node or all nodes

时光怂恿深爱的人放手 提交于 2019-12-13 06:53:38
问题 I have created a 4-node hadoop cluster . I start all datanodes,namenode resource manager,etc. To find whether all of my nodes are working or not , I tried the following procedure: Step 1. I run my program when all nodes are active Step 2. I run my program when only master is active . The completion time in both cases were almost same . So, I would like to know if there is any other means by which I can know how many nodes are actually used while running the program. 回答1: Discussed in the chat

How to access new version of R in linux

被刻印的时光 ゝ 提交于 2019-12-13 06:37:45
问题 I got cluster account from my college, and got installed R 2.13.0 in linux cluster(redhat 2.6.18-128.el5). I followed the below step to get the latest version of R Step 1: download latest version of R from the following link : https://cran.r-project.org/sources.html(i downloaded R-3.2.2) Step 2: upload it into your cluster(I'm using WinSCP in windows 8.1) Step 3: unpack it using the following command tar -xf R-x.y.z.tar.gz in my case its tar -xf R-3.2.2.tar.gz Step 4: go to that directory

Hadoop: Output file has double output

六月ゝ 毕业季﹏ 提交于 2019-12-13 06:09:33
问题 I am running a Hadoop program and have the following as my input file, input.txt : 1 2 mapper.py : import sys for line in sys.stdin: print line, print "Test" reducer.py : import sys for line in sys.stdin: print line, When I run it without Hadoop: $ cat ./input.txt | ./mapper.py | ./reducer.py , the output is as expected: 1 2 Test However, running it through Hadoop via the streaming API (as described here), the latter part of the output seems somewhat "doubled": 1 2 Test Test Aditionally, when

MarkLogic Cluster & Forest replica - XDMP-BAD: No label found

你。 提交于 2019-12-13 05:47:24
问题 We are setting up MarkLogic Cluster in Azure using data directory as Azure Blob container. Following steps has been followed Setup 1st MarkLogic server Added 2nd server in cluster by providing host name of 1st server Added 3rd server in cluster by providing host name of 1st server Add 3 forests, data directory is "azure://" as mentioned in page number 32 here In 1st Forest, added 2 as a replicas As soon as i add replica in Forest01, Label of Forest01 becomes empty with 0 size, before it is

Flink- error on running WordCount example on remote cluster

怎甘沉沦 提交于 2019-12-13 03:58:12
问题 I have a Flink Cluster on VirtualBox incliding three node, 1 master and 2 slaves. I customized WordCount example and create a fat jar file to run it using VirtualBox Flink remote cluster, But I faced Error. Notice : I imported dependencies manually to the project(using Intellij IDEA) and I didn't use maven as dependency provider. I test my code on local machine and it was OK! More details are following: Here is my Java code: import org.apache.flink.api.common.functions.FlatMapFunction; import

Output log file to cluster option

ぐ巨炮叔叔 提交于 2019-12-13 03:00:53
问题 I'm submitting jobs to slurm/sbatch via snakemake . I'm trying to send the log from sbatch to a file in the same directory tree of the rule's output. For example, this works: rm -rf foo snakemake -s test.smk --jobs 1 --cluster "sbatch --output log.txt" but it fails ( i.e. slurm job status is FAILED) if I try: rm -rf foo snakemake -s test.smk --jobs 1 --cluster "sbatch --output {output}.log" presumably, because {output} points to foo/bar/ which does not exist. But snakemake should have created

Nutch in Hadoop 2.x

假如想象 提交于 2019-12-13 00:29:09
问题 I have a three-node cluster running Hadoop 2.2.0 and HBase 0.98.1 and I need to use a Nutch 2.2.1 crawler on top of that. But it only supports Hadoop versions from 1.x branch. By now I am able to submit a Nutch job to my cluster, but it fails with java.lang.NumberFormatException. So my question is pretty simple: how do I make Nutch work in my environment? 回答1: At the moment it's impossible to integrate Nutch 2.2.1 (Gora 0.3) with HBase 0.98.x. See: https://issues.apache.org/jira/browse/GORA

GridGain - programmatically opening nodes using SSH through Grid.startNodes API

≯℡__Kan透↙ 提交于 2019-12-12 21:07:41
问题 I am using Grid.startNodes(java.util.Collection, java.util.Map, boolean, int, int) as defined here: http://gridgain.com/api/javadoc/org/gridgain/grid/Grid.html#startNodes(java.util.Collection, java.util.Map, boolean, int, int) Code I am using: GridConfiguration cfg = GridCfgGenerator.GetConfigurations(true); Grid grid = GridGain.start(cfg); Collection<Map<String,Object>> coll = new ArrayList<>(); Map<String, Object> host = new HashMap<String, Object>(); //host.put("host", "23.101.201.136");

Cluster hangs/shows error while executing simple MPI program in C

时间秒杀一切 提交于 2019-12-12 18:43:24
问题 I am trying to run a simple MPI program(multiple array addition), it runs perfectly in my PC but simply hangs or shows the following error in the cluster. I am using open mpi and the following command to execute Netwok Config of the cluster(master&node1) MASTER eth0 Link encap:Ethernet HWaddr 00:22:19:A4:52:74 inet addr:10.1.1.1 Bcast:10.1.255.255 Mask:255.255.0.0 inet6 addr: fe80::222:19ff:fea4:5274/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:16914 errors:0

Is there multi master database with customizable replication level?

别来无恙 提交于 2019-12-12 17:51:28
问题 I need a multi master database to let users talk from different continents. Each user will write his data to his local master database and it will replicate data to other master databases in other counties/continents. The problem is that I can't store all copies of all users data in all data centers. I need something like a database/solution which will let me to set a replication level. I need to have ability read any data from any nodes but store data on several nodes (on 3 nodes but not on