distributed | 易学教程

Parallel processes in distributed tensorflow

阅读更多关于 Parallel processes in distributed tensorflow

I have neural network in tensorflow with trained parameters, it is "policy" for agent. Network is being updated in training loop in main tensorflow session in core program. In the end of each training cycle I need to pass this network to few parallel processes ("workers"), which will use it for collecting samples from interactions of agent's policy with environment. I need to do it in parallel, because simulating environment takes most of the time and runs only single-core. So, few parallel sampling processes are needed. I am struggling how to structure this in distributed tensorflow. What I

How to visualize the behavior of many concurrent multi-stage processes?

阅读更多关于 How to visualize the behavior of many concurrent multi-stage processes?

问题 Suppose I've got a ton (a continuous stream) of requests to process, and each request has several stages. For example: "connecting to data source", "reading data from data source", "validating data", "processing data", "connecting to data sink", "writing result to data sink". Which visualization methods or even tools fit well to visualize the behavior of such a system? I'd like to be able to see which stages are taking a long time, and how the stages of different requests are aligned with

Convert Matrix to RowMatrix in Apache Spark using Scala

阅读更多关于 Convert Matrix to RowMatrix in Apache Spark using Scala

I'd really like to convert my org.apache.spark.mllib.linalg.Matrix to org.apache.spark.mllib.linalg.distributed.RowMatrix I can do it as such: val xx = X.computeGramianMatrix() //xx is type org.apache.spark.mllib.linalg.Matrix val xxs = xx.toString() val xxr = xxs.split("\n").map(row => row.replace(" "," ").replace(" "," ").replace(" "," ").replace(" "," ").replace(" ",",").split(",")) val xxp = sc.parallelize(xxr) val xxd = xxp.map(ar => Vectors.dense(ar.map(elm => elm.toDouble))) val xxrm: RowMatrix = new RowMatrix(xxd) However, that is really gross and a total hack. Can someone show me a

Are there any general algorithms for achieving eventual consistency in distributed systems?

阅读更多关于 Are there any general algorithms for achieving eventual consistency in distributed systems?

Are there any algorithms that are commonly used for achieving eventual consistency in distributed systems? There are algorithms that have been developed for ACID transactions in distributed systems, Paxos in particular, but is there a similar body of theory that has been developed for BASE scenarios, with weaker consistency guarantees? Edit: This appears to be an area of academic research that is only beginning to be developed. Mcdowella's answer shows that there has been at least some work in this area. If "Anti-entropy protocols for repairing replicated data, which operate by comparing

Managing a Large Number of Log Files Distributed Over Many Machines

阅读更多关于 Managing a Large Number of Log Files Distributed Over Many Machines

We have started using a third party platform (GigaSpaces) that helps us with distributed computing. One of the major problems we are trying to solve now is how to manage our log files in this distributed environment. We have the following setup currently. Our platform is distributed over 8 machines. On each machine we have 12-15 processes that log to separate log files using java.util.logging. On top of this platform we have our own applications that use log4j and log to separate files. We also redirect stdout to a separate file to catch thread dumps and similar. This results in about 200

Emulating network disconnects to locally test distributed app partitioning

阅读更多关于 Emulating network disconnects to locally test distributed app partitioning

I have several instances of a distributed application running on the localhost; every instance communicate with others through certain ports, all instances together make an ensemble. (I'm actually talking about ZooKeeper , running on Linux) Now I want to write unit tests to emulate ensemble partitioning. E.g. I have 5 instances, and I want to split them into two groups of 3 and 2 so that an instance from one group couldn't communicate to an instance from another group. It would emulate real-world situation when 3 machines are in one datacenter, 2 machines are in another one, and datacenters

FileNotFoundException when using Hadoop distributed cache

阅读更多关于 FileNotFoundException when using Hadoop distributed cache

问题 this time someone should please relpy i am struggling with running my code using distributed cahe. i have already the files on hdfs but when i run this code : import java.awt.image.BufferedImage; import java.awt.image.DataBufferByte; import java.awt.image.Raster; import java.io.BufferedReader; import java.io.ByteArrayInputStream; import java.io.DataInputStream; import java.io.FileNotFoundException; import java.io.FileReader; import java.io.IOException; import java.io.InputStreamReader; import

How to design a distributed job scheduler? [closed]

阅读更多关于 How to design a distributed job scheduler? [closed]

I want to design a job scheduler cluster, which contains several hosts to do cron job scheduling. For example, a job which needs run every 5 minutes is submitted to the cluster, the cluster should point out which host to fire next run, making sure: Disaster tolerance: if not all of the hosts are down, the job should be fired successfully. Validity: only one host to fire next job run. Due to disaster tolerance, job cannot bind to a specific host. One way is all the hosts polling a DB table(certainly with lock), this guaranteed only one host gets the next job run. Since it often locks table, is

Find the average of numbers using MapReduce

阅读更多关于 Find the average of numbers using MapReduce

I have been trying to write some code to find the average of numbers using MapReduce. I am trying to use global counters to reach my goal but I am not able to set the counter value in the map method of my Mapper and I am also not able to retrive the counter value in the reduce method of my Reducer. Do I have to use a global counter in map anyway (e.g. by using incrCounter(key, amount) of the provided Reporter )? Or would you suggest any different logic to get the average of some numbers? Sibimon Sasidharan The logic is quite simple: If all the number have the same key , then the mapper sent

What's the difference between ZooKeeper and any distributed Key-Value stores?

阅读更多关于 What's the difference between ZooKeeper and any distributed Key-Value stores?

I am new to zookeeper and distributed systems, and am learning it myself. From what I understand for now, it seems that ZooKeeper is simply a key-value store whose keys are paths and values are strings, which is nothing different from, say, Redis. (And apparently we can use slash-separated path as keys in redis as well.) So my question is, what is the essential difference between ZooKeeper and other distributed KV store? Why is ZooKeeper using so called "paths" as keys, instead of simple strings? kuujo You're comparing the high level data model of ZooKeeper to other key value stores, but that

订阅 distributed