distributed-system

Amazon S3 architecture [closed]

孤人 提交于 2019-12-03 07:22:53
问题 As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 7 years ago . While the post @ http://highscalability.com/amazon-architecture explains Amazon's architecture in general, I am interested in knowing

Find Top 10 Most Frequent visited URl, data is stored across network

自闭症网瘾萝莉.ら 提交于 2019-12-03 04:36:43
问题 Source: Google Interview Question Given a large network of computers, each keeping log files of visited urls, find the top ten most visited URLs. Have many large <string (url) -> int (visits)> maps . Calculate < string (url) -> int (sum of visits among all distributed maps) , and get the top ten in the combined map. Main constraint: The maps are too large to transmit over the network. Also can't use MapReduce directly. I have now come across quite a few questions of this type, where

Best distributed filesystem for commodity linux storage farm [closed]

我只是一个虾纸丫 提交于 2019-12-03 02:44:24
问题 Closed . This question is opinion-based. It is not currently accepting answers. Want to improve this question? Update the question so it can be answered with facts and citations by editing this post. Closed 6 years ago . I have a lot of spare intel linux servers laying around (hundreds) and want to use them for a distributed file system in a web hosting and file sharing environment. This isn't for a HPC application, so high performance isn't critical. The main requirement is high availability

Why isn't RDBMS Partition Tolerant in CAP Theorem and why is it Available?

て烟熏妆下的殇ゞ 提交于 2019-12-02 22:35:23
Two points I don’t understand about RDBMS being CA in CAP Theorem : 1) It says RDBMS is not Partition Tolerant but how is RDBMS any less Partition Tolerant than other technologies like MongoDB or Cassandra? Is there a RDBMS setup where we give up CA to make it AP or CP? 2) How is it CAP-Available? Is it through master-slave setup? As in when the master dies, slave takes over writes? I’m a novice at DB architecture and CAP theorem so please bear with me. A lot of databases now actually have different configurations and depending on the settings you set, it can be either CA, CP, AP, etc but can

Amazon S3 architecture [closed]

烈酒焚心 提交于 2019-12-02 20:53:16
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. While the post @ http://highscalability.com/amazon-architecture explains Amazon's architecture in general, I am interested in knowing how Amazon S3 is implemented. Some of my guesses are A distributed file system like HDFS http://hadoop.apache.org/core/docs

Tensorflow on shared GPUs: how to automatically select the one that is unused

十年热恋 提交于 2019-12-01 03:57:41
I have access through ssh to a cluster of n GPUs. Tensorflow automatically gave them names gpu:0,...,gpu:(n-1). Others have access too and sometimes they take random gpus. I did not place any tf.device() explicitely because that is cumbersome and even if I selected gpu number j and that someone is already on gpu number j that would be problematic. I would like to go throuh the gpus usage and find the first that is unused and use only this one. I guess someone could parse the output of nvidia-smi with bash and get a variable i and feed that variable i to the tensorflow script as the number of

Tensorflow on shared GPUs: how to automatically select the one that is unused

女生的网名这么多〃 提交于 2019-12-01 00:42:13
问题 I have access through ssh to a cluster of n GPUs. Tensorflow automatically gave them names gpu:0,...,gpu:(n-1). Others have access too and sometimes they take random gpus. I did not place any tf.device() explicitely because that is cumbersome and even if I selected gpu number j and that someone is already on gpu number j that would be problematic. I would like to go throuh the gpus usage and find the first that is unused and use only this one. I guess someone could parse the output of nvidia

What library can I use to do simple, lightweight message passing?

泄露秘密 提交于 2019-11-30 07:26:59
I will be starting a project which requires communication between distributed nodes(the project is in C++). I need a lightweight message passing library to pass very simple messages(basically just strings of text) between nodes. The library must have the following characteristics: No external setup required. I need to be able to get everything up-and-running in my code - I don't want to require the user to install any packages or edit any configuration files(other than a list of IP addresses and ports to connect to). The underlying protocol which the library uses must be TCP(or if it is UDP,

NoSQL and eventual consistency - real world examples

梦想与她 提交于 2019-11-29 22:36:25
I'm looking for good examples of NoSQL apps that portray how to work with lack of transactionality as we know it in relational databases. I'm mostly interested in write-intensive code, as for mostly read-only code this is a much easier task. I've read a number of things about NoSQL in general, about CAP theorem, eventual consistency etc. However those things tend to concentrate on the database architecture for its own sake and not on the design patterns to use with it. I do understand that it's impossible to achieve full transactionality within a distributed app. This is exactly why I would

What library can I use to do simple, lightweight message passing?

♀尐吖头ヾ 提交于 2019-11-29 09:42:26
问题 I will be starting a project which requires communication between distributed nodes(the project is in C++). I need a lightweight message passing library to pass very simple messages(basically just strings of text) between nodes. The library must have the following characteristics: No external setup required. I need to be able to get everything up-and-running in my code - I don't want to require the user to install any packages or edit any configuration files(other than a list of IP addresses