distributed

How to create a simple distributed Map for Java application?

帅比萌擦擦* 提交于 2019-12-10 10:16:35
问题 I am looking for a Map to share information between two instances of a Java web application running on separate machines. Reads and writes to this map need to be very fast and don't have to be transactional i.e. its ok if one instance has stale data for a while. Any recommendations? I need to keep track of the last time a user did something in the application, so its not terribly bad if this information is out of date. Speed and ease of use are important. I don't want writes to the Map to

Distributed tensorflow parameter server and workers

*爱你&永不变心* 提交于 2019-12-10 10:08:19
问题 I was closely following the Imagenet distributed TF train example. I am not able to understand how distribution of data takes place when this example is being run on 2 different workers? In theory, different workers should see the different part of the data. Also, what part of the code tells the parameters to pass on the parameter server? Like in the multi-gpu example, there is explicit section for the 'cpu:0'. 回答1: The different workers see different parts of the data by virtue of dequeuing

What are some good distributed queue managers in php?

这一生的挚爱 提交于 2019-12-10 02:02:23
问题 I'm working an image processing website, instead of having lengthy jobs hold up the users browser I want all commands to return fast with a job id and have a background task do the actual work. The id could then be used to check for status and results (ie a url of the processed image). I've found a lot of distributed queue managers for ruby, java and python but I don't know nearly enough of any of those languages to be able to use them. My own tests have been with shared mysql database to

Find the average of numbers using MapReduce

早过忘川 提交于 2019-12-09 16:50:34
问题 I have been trying to write some code to find the average of numbers using MapReduce. I am trying to use global counters to reach my goal but I am not able to set the counter value in the map method of my Mapper and I am also not able to retrive the counter value in the reduce method of my Reducer. Do I have to use a global counter in map anyway (e.g. by using incrCounter(key, amount) of the provided Reporter )? Or would you suggest any different logic to get the average of some numbers? 回答1:

Task priority in celery with redis

落爺英雄遲暮 提交于 2019-12-09 11:54:27
问题 I would like to implement a distributed job execution system with celery. Given that rabbitMQ doesn't support priorities and I'm painfully needing this feature, I turned to celery+redis. In my situation, the tasks are closely related to hardware, for example, task A could only run on Worker 1 since only the PC of Worker 1 has got the necessary hardware. I set the CONCURRENCY of each worker to 1 so that a worker will only run one task each time. Each task takes about 2 minites. To implement

What is the difference between data-centric and object-oriented application models?

回眸只為那壹抹淺笑 提交于 2019-12-09 09:25:33
问题 What is a data-centric application and is there any difference with an object-oriented application model ? 回答1: The two concepts are somewhat orthogonal, a Data Centric Application is one where the database plays a key role, where properties in the database may influence the code paths running in your application and where the code is more generic and all/most business logic is defined through database relations and constraints. OOP can be used to create a data centric application. Some of

Distributed Java Compiler

混江龙づ霸主 提交于 2019-12-09 08:21:42
问题 Is there a distributed compiler for Java, analogous to distcc for C/C++? 回答1: The direct answer to your question is "no". However, it probably would not help you anyway… compiling Java is very fast. On a small project, the compilation is fast enough that you shouldn't really care. On a large project you would need to deal with throwing the file to compile over a network, and having to deal with potentially also throw across many megabytes of dependencies as well. One thing that you can do to

Reading CSV file in Spark in a distributed manner

断了今生、忘了曾经 提交于 2019-12-09 06:52:01
问题 I am developing a Spark processing framework which reads large CSV files, loads them into RDD's, performs some transformations and at the end saves some statistics. The CSV files in question are around 50GB on average. I'm using Spark 2.0. My question is: When I load the files using sparkContext.textFile() function, does the file needs to be stored in the memory of the driver first, and then it is distributed to the workers (thus requiring a rather large amount of memory on the driver)? Or

Experience with Hadoop?

六眼飞鱼酱① 提交于 2019-12-09 05:29:58
问题 Have any of you tried Hadoop? Can it be used without the distributed filesystem that goes with it, in a Share-nothing architecture? Would that make sense? I'm also interested into any performance results you have... 回答1: Yes, you can use Hadoop on a local filesystem by using file URIs instead of hdfs URIs in various places. I think a lot of the examples that come with Hadoop do this. This is probably fine if you just want to learn how Hadoop works and the basic map-reduce paradigm, but you

Distributed System

旧街凉风 提交于 2019-12-09 01:45:39
问题 I am looking to create a distributed framework in Java and need some help sorting out the implementation of a client/manager/worker situation as described in my pseudocode below. Manager BEGIN WHILE(true) RECEIVE message FROM client IF (worker_connections > 0) THEN FOR (i=0;i<worker_connections;i++) SEND message TO worker[i] FOR (i=0;i<worker_connections;i++) RECIEVE result[i] FROM worker[i] SEND merge(result[]) TO client ELSE SEND "No workers available" TO client END IF END WHILE END Client