distributed | 易学教程

How to create a simple distributed Map for Java application?

阅读更多关于 How to create a simple distributed Map for Java application?

问题 I am looking for a Map to share information between two instances of a Java web application running on separate machines. Reads and writes to this map need to be very fast and don't have to be transactional i.e. its ok if one instance has stale data for a while. Any recommendations? I need to keep track of the last time a user did something in the application, so its not terribly bad if this information is out of date. Speed and ease of use are important. I don't want writes to the Map to

Distributed tensorflow parameter server and workers

阅读更多关于 Distributed tensorflow parameter server and workers

问题 I was closely following the Imagenet distributed TF train example. I am not able to understand how distribution of data takes place when this example is being run on 2 different workers? In theory, different workers should see the different part of the data. Also, what part of the code tells the parameters to pass on the parameter server? Like in the multi-gpu example, there is explicit section for the 'cpu:0'. 回答1: The different workers see different parts of the data by virtue of dequeuing

What are some good distributed queue managers in php?

阅读更多关于 What are some good distributed queue managers in php?

问题 I'm working an image processing website, instead of having lengthy jobs hold up the users browser I want all commands to return fast with a job id and have a background task do the actual work. The id could then be used to check for status and results (ie a url of the processed image). I've found a lot of distributed queue managers for ruby, java and python but I don't know nearly enough of any of those languages to be able to use them. My own tests have been with shared mysql database to

Find the average of numbers using MapReduce

阅读更多关于 Find the average of numbers using MapReduce

问题 I have been trying to write some code to find the average of numbers using MapReduce. I am trying to use global counters to reach my goal but I am not able to set the counter value in the map method of my Mapper and I am also not able to retrive the counter value in the reduce method of my Reducer. Do I have to use a global counter in map anyway (e.g. by using incrCounter(key, amount) of the provided Reporter )? Or would you suggest any different logic to get the average of some numbers? 回答1:

Task priority in celery with redis

阅读更多关于 Task priority in celery with redis

问题 I would like to implement a distributed job execution system with celery. Given that rabbitMQ doesn't support priorities and I'm painfully needing this feature, I turned to celery+redis. In my situation, the tasks are closely related to hardware, for example, task A could only run on Worker 1 since only the PC of Worker 1 has got the necessary hardware. I set the CONCURRENCY of each worker to 1 so that a worker will only run one task each time. Each task takes about 2 minites. To implement

What is the difference between data-centric and object-oriented application models?

阅读更多关于 What is the difference between data-centric and object-oriented application models?

问题 What is a data-centric application and is there any difference with an object-oriented application model ? 回答1: The two concepts are somewhat orthogonal, a Data Centric Application is one where the database plays a key role, where properties in the database may influence the code paths running in your application and where the code is more generic and all/most business logic is defined through database relations and constraints. OOP can be used to create a data centric application. Some of

Distributed Java Compiler

阅读更多关于 Distributed Java Compiler

问题 Is there a distributed compiler for Java, analogous to distcc for C/C++? 回答1: The direct answer to your question is "no". However, it probably would not help you anyway… compiling Java is very fast. On a small project, the compilation is fast enough that you shouldn't really care. On a large project you would need to deal with throwing the file to compile over a network, and having to deal with potentially also throw across many megabytes of dependencies as well. One thing that you can do to

Reading CSV file in Spark in a distributed manner

阅读更多关于 Reading CSV file in Spark in a distributed manner

问题 I am developing a Spark processing framework which reads large CSV files, loads them into RDD's, performs some transformations and at the end saves some statistics. The CSV files in question are around 50GB on average. I'm using Spark 2.0. My question is: When I load the files using sparkContext.textFile() function, does the file needs to be stored in the memory of the driver first, and then it is distributed to the workers (thus requiring a rather large amount of memory on the driver)? Or

Experience with Hadoop?

阅读更多关于 Experience with Hadoop?

问题 Have any of you tried Hadoop? Can it be used without the distributed filesystem that goes with it, in a Share-nothing architecture? Would that make sense? I'm also interested into any performance results you have... 回答1: Yes, you can use Hadoop on a local filesystem by using file URIs instead of hdfs URIs in various places. I think a lot of the examples that come with Hadoop do this. This is probably fine if you just want to learn how Hadoop works and the basic map-reduce paradigm, but you

Distributed System

阅读更多关于 Distributed System

问题 I am looking to create a distributed framework in Java and need some help sorting out the implementation of a client/manager/worker situation as described in my pseudocode below. Manager BEGIN WHILE(true) RECEIVE message FROM client IF (worker_connections > 0) THEN FOR (i=0;i<worker_connections;i++) SEND message TO worker[i] FOR (i=0;i<worker_connections;i++) RECIEVE result[i] FROM worker[i] SEND merge(result[]) TO client ELSE SEND "No workers available" TO client END IF END WHILE END Client