cluster-computing

KD/Qtree Implementation

孤街浪徒 提交于 2019-12-14 04:19:48
问题 I have a following path data: id1 p1 p2 0 1 7.935 5.103 1 1 7.934 5.112 2 1 7.936 5.102 3 1 7.938 5.145 4 2 7.930 5.191 5 2 7.945 5.161 6 2 7.954 5.127 In the above data frame, (p1,p2) forms the coordinate data and all the points belonging to the same "id1" forms one separate path; in the above df rows(0-3) belonging to id1 = 1 is one path and so on. I am trying to implement Quadtree for the analysis of these trajectories. To implement Quadtrees I am trying to use "pyqtree" https://github.com

Is Terracota Cluster still opensource?

点点圈 提交于 2019-12-14 01:35:01
问题 and if yes, where it could be found?! According to this infoq entry, they have opensourced it. But right now, there is not such a product in their website. 回答1: This page, http://terracotta.org/dl/oss-download-catalog , has the links to the opensource version of terracotta and related products. You may be presented with a page requiring you to register prior to the download. I tried downloading from there yesterday, and I was able to successfully download terracotta and get it running locally

Erlang: starting slave node

徘徊边缘 提交于 2019-12-14 01:00:29
问题 I'm trying to start erlang slave node on cluster and I receive "bash: erl: command not found" message. Though I have alias for erl. Here is what I actually do: [user@n001 ~]$ erl -rsh ssh -sname n001 Eshell V5.7.5 (abort with ^G) (n001@n001)1> slave:start_link("user@n002", n002, "-rsh ssh"). bash: erl: command not found {error,timeout} (n001@n001)2> Maybe, there is something wrong? Thanks. UPDATE: I've added erlang bin dir to my $PATH variable; I've set $ERLANG_ROOT_DIR variable; created

Hazelcast: notify when a cluster node dies

梦想与她 提交于 2019-12-13 22:16:33
问题 I am quite a newbie to Hazelcast. I'm building a cluster where different nodes take in charge different activities. When a node dies, I'd like other nodes to notice, so they can reassign the dead node's activities among themselves. Is this possible? I have already made some research for this, but I couldn't find anything useful. Any help would be appreciated :) 回答1: There are a number of ways here, probably the simplest for what you describe is http://docs.hazelcast.org/docs/3.8.5/javadoc/com

The benefits of deploying multiple instances for serving/data/cache

给你一囗甜甜゛ 提交于 2019-12-13 16:21:51
问题 although I've much experience writing code. I don't really have much experience deploying things. I am writing a project that uses mongodb for persistence, redis for meta-caching, and play for serving pages. I am deciding whether to buy a dedicated server vs buying multiple small/medium instance from amazon/linode (one for each, mongo, redis, play). I have thought of the trade-offs as below, I wonder if anyone can add to the list or provide further insights. I am leaning toward (b) buying two

Ehcache Replicated Cache not synchronizing at startup

我是研究僧i 提交于 2019-12-13 13:31:21
问题 I have an ehcache Cache replicated across two machines. The peers correctly find each other and replicate once both peers are started. However, if the 1st peer starts first, and receives several elements, and then the 2nd peer starts later... the 2nd peer never sees the elements that were added while it was not yet alive. Here is exactly the order: Cache A is started Add "1234" to Cache A Cache B is started get "1234" from Cache B -> NOT FOUND My expectation: If 2 caches are replicated, then

nodejs cluster module - Address in use error

旧街凉风 提交于 2019-12-13 13:14:42
问题 I have an express.js application and it has to run a sub-process everytime there is a particular request (here it is : /compute/real-time ). There will be user-created scripts to compute the data. So, I am using node cluster module to create a pool of workers and pick the one which is free to execute the scripts. But I have hit the wall during the creation of cluster itself. Here is the code clusterPool.js var cluster = require('cluster'); exports.setupCluster = function(){ console.log (

update variables using map function on spark

一个人想着一个人 提交于 2019-12-13 10:42:09
问题 Here is my code: val dataRDD = sc.textFile(args(0)).map(line => line.split(" ")).map(x => Array(x(0).toInt, x(1).toInt, x(2).toInt)) var arr = new Array[Int](3) printArr(arr) dataRDD.map(x => {arr=x}) printArr(arr) This code is not working properly. How can i make it work successfully? 回答1: Okay, so operations on RDDs are performed in parallel by different workers (usually on different machines in the cluster) and therefore you cannot pass in this type of "global" object arr to be updated.

SocketTimeoutException when running hadoop distcp -update between clusters

浪子不回头ぞ 提交于 2019-12-13 07:37:08
问题 I'm using hadoop distcp -update to copy directory from one HDFS cluster to different one. Sometime (pretty often) I get this kind of exception: 13/07/03 00:20:03 INFO tools.DistCp: srcPaths=[hdfs://HDFS1:51175/directory_X] 13/07/03 00:20:03 INFO tools.DistCp: destPath=hdfs://HDFS2:51175/directory_X 13/07/03 00:25:27 WARN hdfs.DFSClient: src=directory_X, datanodes[0].getName()=***.***.***.***:8550 java.net.SocketTimeoutException: 69000 millis timeout while waiting for channel to be ready for

Cassandra Cluster Set up - Unable to gossip with any seeds

隐身守侯 提交于 2019-12-13 07:24:56
问题 I am trying to set up a 3 node Cassandra VM cluster. I installed cassandra from datastax package on individual vms and then modified the following: Seed - vm1 (set the ip address in all the vm configs) Updated the config with listen_address as the host ip, added the rpc_broadcast_address Added the cassandra ports in the firewall rules to allow for inter vm communication Also tried connecting to the vms using SSH After trying all of this, I started the cassandra seed node, it comes up fine and