distributed-system | 易学教程

Amazon S3 architecture [closed]

阅读更多关于 Amazon S3 architecture [closed]

问题 As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 7 years ago . While the post @ http://highscalability.com/amazon-architecture explains Amazon's architecture in general, I am interested in knowing

Find Top 10 Most Frequent visited URl, data is stored across network

阅读更多关于 Find Top 10 Most Frequent visited URl, data is stored across network

问题 Source: Google Interview Question Given a large network of computers, each keeping log files of visited urls, find the top ten most visited URLs. Have many large <string (url) -> int (visits)> maps . Calculate < string (url) -> int (sum of visits among all distributed maps) , and get the top ten in the combined map. Main constraint: The maps are too large to transmit over the network. Also can't use MapReduce directly. I have now come across quite a few questions of this type, where

Best distributed filesystem for commodity linux storage farm [closed]

阅读更多关于 Best distributed filesystem for commodity linux storage farm [closed]

问题 Closed . This question is opinion-based. It is not currently accepting answers. Want to improve this question? Update the question so it can be answered with facts and citations by editing this post. Closed 6 years ago . I have a lot of spare intel linux servers laying around (hundreds) and want to use them for a distributed file system in a web hosting and file sharing environment. This isn't for a HPC application, so high performance isn't critical. The main requirement is high availability

Why isn't RDBMS Partition Tolerant in CAP Theorem and why is it Available?

阅读更多关于 Why isn't RDBMS Partition Tolerant in CAP Theorem and why is it Available?

Two points I don’t understand about RDBMS being CA in CAP Theorem : 1) It says RDBMS is not Partition Tolerant but how is RDBMS any less Partition Tolerant than other technologies like MongoDB or Cassandra? Is there a RDBMS setup where we give up CA to make it AP or CP? 2) How is it CAP-Available? Is it through master-slave setup? As in when the master dies, slave takes over writes? I’m a novice at DB architecture and CAP theorem so please bear with me. A lot of databases now actually have different configurations and depending on the settings you set, it can be either CA, CP, AP, etc but can

Amazon S3 architecture [closed]

阅读更多关于 Amazon S3 architecture [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. While the post @ http://highscalability.com/amazon-architecture explains Amazon's architecture in general, I am interested in knowing how Amazon S3 is implemented. Some of my guesses are A distributed file system like HDFS http://hadoop.apache.org/core/docs

Tensorflow on shared GPUs: how to automatically select the one that is unused

阅读更多关于 Tensorflow on shared GPUs: how to automatically select the one that is unused

I have access through ssh to a cluster of n GPUs. Tensorflow automatically gave them names gpu:0,...,gpu:(n-1). Others have access too and sometimes they take random gpus. I did not place any tf.device() explicitely because that is cumbersome and even if I selected gpu number j and that someone is already on gpu number j that would be problematic. I would like to go throuh the gpus usage and find the first that is unused and use only this one. I guess someone could parse the output of nvidia-smi with bash and get a variable i and feed that variable i to the tensorflow script as the number of

Tensorflow on shared GPUs: how to automatically select the one that is unused

阅读更多关于 Tensorflow on shared GPUs: how to automatically select the one that is unused

问题 I have access through ssh to a cluster of n GPUs. Tensorflow automatically gave them names gpu:0,...,gpu:(n-1). Others have access too and sometimes they take random gpus. I did not place any tf.device() explicitely because that is cumbersome and even if I selected gpu number j and that someone is already on gpu number j that would be problematic. I would like to go throuh the gpus usage and find the first that is unused and use only this one. I guess someone could parse the output of nvidia

What library can I use to do simple, lightweight message passing?

阅读更多关于 What library can I use to do simple, lightweight message passing?

I will be starting a project which requires communication between distributed nodes(the project is in C++). I need a lightweight message passing library to pass very simple messages(basically just strings of text) between nodes. The library must have the following characteristics: No external setup required. I need to be able to get everything up-and-running in my code - I don't want to require the user to install any packages or edit any configuration files(other than a list of IP addresses and ports to connect to). The underlying protocol which the library uses must be TCP(or if it is UDP,

NoSQL and eventual consistency - real world examples

阅读更多关于 NoSQL and eventual consistency - real world examples

I'm looking for good examples of NoSQL apps that portray how to work with lack of transactionality as we know it in relational databases. I'm mostly interested in write-intensive code, as for mostly read-only code this is a much easier task. I've read a number of things about NoSQL in general, about CAP theorem, eventual consistency etc. However those things tend to concentrate on the database architecture for its own sake and not on the design patterns to use with it. I do understand that it's impossible to achieve full transactionality within a distributed app. This is exactly why I would

What library can I use to do simple, lightweight message passing?

阅读更多关于 What library can I use to do simple, lightweight message passing?

问题 I will be starting a project which requires communication between distributed nodes(the project is in C++). I need a lightweight message passing library to pass very simple messages(basically just strings of text) between nodes. The library must have the following characteristics: No external setup required. I need to be able to get everything up-and-running in my code - I don't want to require the user to install any packages or edit any configuration files(other than a list of IP addresses