sharding

When do you start additional Elasticsearch nodes?

爷,独闯天下 提交于 2019-12-02 14:14:57
I'm in the middle of attempting to replace a Solr setup with Elasticsearch. This is a new setup, which has not yet seen production, so I have lots of room to fiddle with things and get them working well. I have very, very large amounts of data. I'm indexing some live data and holding onto it for 7 days (by using the _ttl field). I do not store any data in the index (and disabled the _source field). I expect my index to stabilize around 20 billion rows. I will be putting this data into 2-3 named indexes. Search performance so far with up to a few billion rows is totally acceptable, but indexing

How does MongoDB distribute data across a cluster

那年仲夏 提交于 2019-12-02 12:05:42
I've read about sharding a collection in MongoDB. MongoDB lets me shard a collection explicitly by calling shardCollection method. There I can choose whether I want it to be rangely shareded or hashingly sharded. My question is, what would happen if I didn't call the shardCollection method, and I had say 100 nodes? Would MongoDB keep the collections intact and distribute them across the cluster? Would MongoDB keep all the collections in a single node? Do I completely not understand how this works? A database can have a mixture of sharded and unsharded collections. Sharded collections are

A Client Walks Into a Server And Asks “What's New?” – Problems With Timestamps

喜欢而已 提交于 2019-12-02 10:19:00
问题 I'm looking for a solution to an edge case scenario where a client continually asking the server for what's new will fail due to timestamps. In this example, I'm not using sequence numbers because of another edge case problem. You can see that problem here: A Client Walks Into a Server And Asks "What's New?" – Problems With Sequence Numbers Assume we're using timestamps. Every row update adds a timestamp of the server time. Clients continually ask what's new since the timestamp of the last

MongoDB query on all sharded collections without shardkey

强颜欢笑 提交于 2019-12-02 10:10:42
I have several shard-(ed) collections. The collection is user requests. and the shard key is User Id. I have a field named "Execution Time" and I want query all the requests in a period of time (lte and gte). The index is with the shard key, but my query is without. I would like not to put all the shard Key in query with a "in" operator because I have a 1000 shard keys (users).. futher more to do that i need to get all user ids on every query - it means 2 queries each time instead of 1. But still i want to use an index.. what option is to add userId > 0 < maxUserId to the query? What is the

MySQL Proxy Alternatives for Database Sharding

二次信任 提交于 2019-11-30 07:42:40
Are there any alternatives for MySQL Proxy. I don't want to use it since it's still in alpha. I will have 10 MySQL servers with table_1 table_2 table_3 table_4 ... table_10 spread across the 10 servers. Each table is identical in their structure, their just shards with different data sets. Is there a alternative to MySQL Proxy, where I can have my client application connect to a single SQL Server (A proxy), which looks at the query and fetches the data on behalf of it. For example, if the client requests "SELECT * FROM table_5 WHERE user=123" from the Proxy, which connects to the 5th SQL

How to increase Redis performance when 100% CPU? Sharding? Fastest .Net Client?

∥☆過路亽.° 提交于 2019-11-30 07:36:36
Due to massive load increases on our website redis is now struggling with peak load because the redis server instance is reaching 100% CPU (on one of eight cores) resulting in time outs. We've updated our client software to ServiceStack V3 (coming from BookSleeve 1.1.0.4) and upgraded the redis server to 2.8.11 (coming from 2.4.x). I chose ServiceStack due to the existence of the Harbour.RedisSessionStateStore that uses ServiceStack.Redis. We used AngiesList.Redis before together with BookSleeve, but we experienced 100% with that too. We have eight redis servers configured as a master/slave

How do I speed up deletes from a large database table?

ⅰ亾dé卋堺 提交于 2019-11-30 04:42:40
Here's the problem I am trying to solve: I have recently completed a data layer re-design that allows me to load-balance my database across multiple shards. In order to keep shards balanced, I need to be able to migrate data from one shard to another, which involves copying from shard A to shard B, and then deleting the records from shard A. But I have several tables that are very big, and have many foreign keys pointed to them, so deleting a single record from the table can take more than one second. In some cases I need to delete millions of records from the tables, and it just takes too

Resources for Database Sharding and Partitioning

你离开我真会死。 提交于 2019-11-30 04:01:44
I'm working with a database schema that is running into scalability issues. One of the tables in the schema has grown to around 10 million rows, and I am exploring sharding and partitioning options to allow this schema to scale to much larger datasets (say, 1 billion to 100 billion rows). Our application must also be deployable onto several database products, including but not limited to Oracle, MS SQL Server, and MySQL. This is a large problem in general, and I'd like to read up on what options are available. What resources are out there (books, whitepapers, web sites) for database sharding

How to Programmatically Pre-Split a GUID Based Shard Key with MongoDB

烂漫一生 提交于 2019-11-29 20:01:35
Let's say I am using a fairly standard 32 character hex GUID , and I have determined that, because it is randomly generated for my users, it is perfect for use as a shard key to horizontally scale writes to the MongoDB collection that I will be storing the user information in (and write scaling is my primary concern). I also know that I will need to start with at least 4 shards, because of traffic projections and some benchmark work done with a test environment. Finally, I have a decent idea of my initial data size (average document size * number of initial users) - which comes to around

MySQL Partitioning / Sharding / Splitting - which way to go?

ⅰ亾dé卋堺 提交于 2019-11-29 19:10:34
We have an InnoDB database that is about 70 GB and we expect it to grow to several hundred GB in the next 2 to 3 years. About 60 % of the data belong to a single table. Currently the database is working quite well as we have a server with 64 GB of RAM, so almost the whole database fits into memory, but we’re concerned about the future when the amount of data will be considerably larger. Right now we’re considering some way of splitting up the tables (especially the one that accounts for the biggest part of the data) and I’m now wondering, what would be the best way to do it. The options I’m