cluster-computing | 易学教程

In Hadoop where does the framework save the output of the Map task in a normal Map-Reduce Application?

阅读更多关于 In Hadoop where does the framework save the output of the Map task in a normal Map-Reduce Application?

问题 I am trying to find out where does the output of a Map task is saved to disk before it can be used by a Reduce task. Note: - version used is Hadoop 0.20.204 with the new API For example, when overwriting the map method in the Map class: public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken())

Run multiple cassandra nodes (a cluster) from the same machine?

阅读更多关于 Run multiple cassandra nodes (a cluster) from the same machine?

问题 How can I run 3 cassandra nodes (actually a cluster) from my Ubuntu? I don't want to create 3 instances of VMWare/VirtualBox but instead, configure each cassandra node to listen to a different port. Is that possible with one cassandra installation? A solution that came to my mind is to have 3 local cassandra installation and configure each cassandra.yaml independently but actually I would prefer to have achieve that by my installed cassandra configuration files. I need such configuration only

Running slurm script with multiple nodes, launch job steps with 1 task

阅读更多关于 Running slurm script with multiple nodes, launch job steps with 1 task

问题 I am trying to launch a large number of job steps using a batch script. The different steps can be completely different programs and do need exactly one CPU each. First I tried doing this using the --multi-prog argument to srun . Unfortunately, when using all CPUs assigned to my job in this manner, performance degrades massively. The run time increases to almost its serialized value. By undersubscribing I could ameliorate this a little. I couldn't find anything online regarding this problem,

Hadoop: binding multiple IP addresses to a cluster NameNode

阅读更多关于 Hadoop: binding multiple IP addresses to a cluster NameNode

I've a four-node Hadoop cluster on Softlayer. The master (NameNode) has a public IP address for external access and a private IP address for cluster access. The slave nodes (datanodes) have private IP address which I'm trying to connect to the master without the need of assigning public IP addresses to each slave node. I've realised that setting fs.defaultFS to the NameNode's public address allows for external access, except that the NameNode only listens to that address for incoming connections, not the private address. So I get ConnectionRefused exceptions in the datanode logs as they're

What to use instead of the “lock” statement when the code is running on multiple machines?

阅读更多关于 What to use instead of the “lock” statement when the code is running on multiple machines?

The lock statement ensures that one thread does not enter a critical section of code while another thread is in the critical section. However, it won't work if the workload is spread across a farm of servers (e.g. a few IIS servers + a load balancer). Does .NET support such a scenario? Is there any class that can be used to control the execution of a critical code section by threads running on multiple machines? If not, is there any standard method of handling such problems? This question was inspired by a discussion that started here but is not limited to SharePoint or ASP.NET. Kit If you

Socket.io websocket authorization failing when clustering node application

阅读更多关于 Socket.io websocket authorization failing when clustering node application

问题 Question: Is it possible to cluster an application which is using Socket.io for WebSocket support? If so what would be the best method of implementation? I've built an application which uses Express and Socket.io, built on Node.js. I'd like to incorporate clustering to increase the amount of requests that my application can process. The following causes my application to produce a socket handshake error... var cluster = require('cluster'); var numCPUs = require('os').cpus().length; if

What are the different approaches for Java EE session replication?

阅读更多关于 What are the different approaches for Java EE session replication?

问题 I am working on a project that requires really high availability and my team is currently working on upgrading some infra-structure and software for a future release. One of the features we would like to enable is to have session replication across not only different servers, but ideally across different sites (geographically spread). Is that possible? What are the approaches? For what I have seen so far, to enable session replication, the usual vendor approaches are either one of these:

ElasticSearch setup for a large cluster with heavy aggregations

阅读更多关于 ElasticSearch setup for a large cluster with heavy aggregations

问题 Context and current state We are migrating our cluster from Cassandra to a full ElasticSearch cluster. We are indexing documents at average of ~250-300 docs per seconds . In ElasticSearch 1.2.0 it represents ~8Go per day. { "generic": { "id": "twi471943355505459200", "type": "twitter", "title": "RT @YukBerhijabb: The Life is Choice - https://m.facebook.com/story.php?story_fbid=637864496306297&id=100002482564531&refid=17", "content": "RT @YukBerhijabb: The Life is Choice - https://m.facebook

WebLogic load balancing

阅读更多关于 WebLogic load balancing

I'm currently developing a project supported on a WebLogic clustered environment. I've successfully set up the cluster, but now I want a load-balancing solution (currently, only for testing purposes, I'm using WebLogic's HttpClusterServlet with round-robin load-balancing). Is there any documentation that gives a clear comparison (with pros and cons) of the various ways of providing load-balancing for WebLogic? These are the main topics I want to cover: Performance (normal and on failover ); What failures can be detected and how fast is the failover recovery; Transparency to failure (e.g.,

Spring Singleton in Clustered Environment

阅读更多关于 Spring Singleton in Clustered Environment

As discussed in this post, it is not suitable to use singleton in clustered environment (because of multiple singleton objects in different JVMs), this must be true for singletons created by Spring framework. If that's correct, then we have to be a lot careful using Spring framework to use singleton classes. Can you please tell if this is correct understanding? This is not necessarily the case. It is a problem to use singletons across separate JVMs if they share meaningful state . For instance, a singleton that stored and issued incrementing IDs would be very dangerous if two separate