sharding

How to scale tf.nn.embedding_lookup_sparse

╄→尐↘猪︶ㄣ 提交于 2019-12-21 03:03:30
问题 I'm trying to build a very large sparse model (e.g. LR if there is only one embedding layer), the input dimension can be as large as 100000000, and the sample is very sparse, the average number of non zero value is around 100. Since the weights is very large and we have to partition and distribute it onto different servers. Here is the code: weights = tf.get_variable("weights", weights_shape, partitioner=tf.fixed_size_partitioner(num_shards, axis=0), initializer=tf.truncated_normal

How to scale tf.nn.embedding_lookup_sparse

本秂侑毒 提交于 2019-12-21 03:03:17
问题 I'm trying to build a very large sparse model (e.g. LR if there is only one embedding layer), the input dimension can be as large as 100000000, and the sample is very sparse, the average number of non zero value is around 100. Since the weights is very large and we have to partition and distribute it onto different servers. Here is the code: weights = tf.get_variable("weights", weights_shape, partitioner=tf.fixed_size_partitioner(num_shards, axis=0), initializer=tf.truncated_normal

ID Generation for Sharded Database (Azure Federated Database)

馋奶兔 提交于 2019-12-21 00:54:50
问题 I have been looking for some articles or guidence on best practice for id generation (for the federated/primary key) for Azure Federated databases and haven't found anything compelling. Federated tables don't support identity columns, so it seems to me that the only practical type of id is a GUID, as trying to centrally create and use a BigInt creates a single point of failure in the app. My chief concern is the performance implications of using GUIDs over BigInts (particularly for indexing

How would I learn more about sharding userdata for a website?

南笙酒味 提交于 2019-12-20 16:48:21
问题 I'm interested in sharding my websites user data across multiple servers. For example, users will login from the same place. but the login script needs to figure out what server that users data resides on. So the login script would query the master registry for that user name, and it might return that it's on server B. The login script would then connect to server B and verify the username/password. Does that make sense? Is it normal to have something like a master registry to resolve where

Increase number of shards in DynamoDB to spin up more lambdas in parallel

房东的猫 提交于 2019-12-19 18:57:17
问题 I'm currently using DynamoDB streams to process changed collection values with lambda functions, however, currently, I'm only running two lambda instances in parallel, which is not enough to process all the incoming data and lambda functions are just queued up. From aws documentation I can see that number of lambdas that can run in parallel is proportional to the number of shards of your DynamoDB: If you create a Lambda function that processes events from stream-based services (Amazon Kinesis

Increase number of shards in DynamoDB to spin up more lambdas in parallel

五迷三道 提交于 2019-12-19 18:57:12
问题 I'm currently using DynamoDB streams to process changed collection values with lambda functions, however, currently, I'm only running two lambda instances in parallel, which is not enough to process all the incoming data and lambda functions are just queued up. From aws documentation I can see that number of lambdas that can run in parallel is proportional to the number of shards of your DynamoDB: If you create a Lambda function that processes events from stream-based services (Amazon Kinesis

How to increase Redis performance when 100% CPU? Sharding? Fastest .Net Client?

五迷三道 提交于 2019-12-18 12:40:12
问题 Due to massive load increases on our website redis is now struggling with peak load because the redis server instance is reaching 100% CPU (on one of eight cores) resulting in time outs. We've updated our client software to ServiceStack V3 (coming from BookSleeve 1.1.0.4) and upgraded the redis server to 2.8.11 (coming from 2.4.x). I chose ServiceStack due to the existence of the Harbour.RedisSessionStateStore that uses ServiceStack.Redis. We used AngiesList.Redis before together with

Resources for Database Sharding and Partitioning

半城伤御伤魂 提交于 2019-12-18 11:33:29
问题 I'm working with a database schema that is running into scalability issues. One of the tables in the schema has grown to around 10 million rows, and I am exploring sharding and partitioning options to allow this schema to scale to much larger datasets (say, 1 billion to 100 billion rows). Our application must also be deployable onto several database products, including but not limited to Oracle, MS SQL Server, and MySQL. This is a large problem in general, and I'd like to read up on what

multiple consumers per kinesis shard

廉价感情. 提交于 2019-12-18 00:05:48
问题 I read you can have multiple consumer apps per kinesis stream. http://docs.aws.amazon.com/kinesis/latest/dev/developing-consumers-with-kcl.html however, I heard you can only have on consumer per shard. Is this true? I don't find any documentation to support this, and can't imagine how that could be if multiple consumers are reading from the same stream. Certainly, it doesn't mean the producer needs to repeat content in different shards for different consumers. 回答1: Kinesis Client Library

MongoDB querying performance for over 5 million records

☆樱花仙子☆ 提交于 2019-12-17 21:24:56
问题 We've recently hit the >2 Million records for one of our main collections and now we started to suffer for major performance issues on that collection. They documents in the collection have about 8 fields which you can filter by using UI and the results are supposed to sorted by a timestamp field the record was processed. I've added several compound indexes with the filtered fields and the timetamp e.g: db.events.ensureIndex({somefield: 1, timestamp:-1}) I've also added couple of indexes for