sharding

MySQL Proxy Alternatives for Database Sharding

夙愿已清 提交于 2019-11-29 10:26:15
问题 Are there any alternatives for MySQL Proxy. I don't want to use it since it's still in alpha. I will have 10 MySQL servers with table_1 table_2 table_3 table_4 ... table_10 spread across the 10 servers. Each table is identical in their structure, their just shards with different data sets. Is there a alternative to MySQL Proxy, where I can have my client application connect to a single SQL Server (A proxy), which looks at the query and fetches the data on behalf of it. For example, if the

How do I speed up deletes from a large database table?

醉酒当歌 提交于 2019-11-29 02:03:34
问题 Here's the problem I am trying to solve: I have recently completed a data layer re-design that allows me to load-balance my database across multiple shards. In order to keep shards balanced, I need to be able to migrate data from one shard to another, which involves copying from shard A to shard B, and then deleting the records from shard A. But I have several tables that are very big, and have many foreign keys pointed to them, so deleting a single record from the table can take more than

MongoDB to Use Sharding with $lookup Aggregation Operator

帅比萌擦擦* 提交于 2019-11-28 18:19:36
$lookup is new in MongoDB 3.2. It performs a left outer join to an unsharded collection in the same database to filter in documents from the “joined” collection for processing. To use $lookup , the from collection cannot be sharded. On the other hand, sharding is a useful horizontal scaling approach. What's the best practise to use them together? As the docs you quote indicate, you can't use $lookup on a sharded collection. So the best practice workaround is to perform the lookup yourself in a separate query. Perform your aggregate query. Pull the "localField" values from your query results

How to Programmatically Pre-Split a GUID Based Shard Key with MongoDB

不羁岁月 提交于 2019-11-28 15:52:33
问题 Let's say I am using a fairly standard 32 character hex GUID, and I have determined that, because it is randomly generated for my users, it is perfect for use as a shard key to horizontally scale writes to the MongoDB collection that I will be storing the user information in (and write scaling is my primary concern). I also know that I will need to start with at least 4 shards, because of traffic projections and some benchmark work done with a test environment. Finally, I have a decent idea

what is a good way to horizontal shard in postgresql

徘徊边缘 提交于 2019-11-28 15:34:41
问题 what is a good way to horizontal shard in postgresql 1. pgpool 2 2. gridsql which is a better way to use sharding also is it possible to paritition without changing client code It would be great if some one can share a simple tutorial or cookbook example of how to setup and use sharding 回答1: PostgreSQL allows partitioning in two different ways. One is by range and the other is by list. Both use table inheritance to do partition. Partitioning by range, usually a date range, is the most common,

MongoDB querying performance for over 5 million records

百般思念 提交于 2019-11-28 15:09:52
We've recently hit the >2 Million records for one of our main collections and now we started to suffer for major performance issues on that collection. They documents in the collection have about 8 fields which you can filter by using UI and the results are supposed to sorted by a timestamp field the record was processed. I've added several compound indexes with the filtered fields and the timetamp e.g: db.events.ensureIndex({somefield: 1, timestamp:-1}) I've also added couple of indexes for using several filters at once to hopefully achieve better performance. But some filters still take

MySQL Partitioning / Sharding / Splitting - which way to go?

耗尽温柔 提交于 2019-11-28 14:37:00
问题 We have an InnoDB database that is about 70 GB and we expect it to grow to several hundred GB in the next 2 to 3 years. About 60 % of the data belong to a single table. Currently the database is working quite well as we have a server with 64 GB of RAM, so almost the whole database fits into memory, but we’re concerned about the future when the amount of data will be considerably larger. Right now we’re considering some way of splitting up the tables (especially the one that accounts for the

ElasticSearch: Unassigned Shards, how to fix?

妖精的绣舞 提交于 2019-11-27 10:00:28
I have an ES cluster with 4 nodes: number_of_replicas: 1 search01 - master: false, data: false search02 - master: true, data: true search03 - master: false, data: true search04 - master: false, data: true I had to restart search03, and when it came back, it rejoined the cluster no problem, but left 7 unassigned shards laying about. { "cluster_name" : "tweedle", "status" : "yellow", "timed_out" : false, "number_of_nodes" : 4, "number_of_data_nodes" : 3, "active_primary_shards" : 15, "active_shards" : 23, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 7 } Now my

数据库分片id的设计(PHP实例)

你。 提交于 2019-11-27 07:38:11
为什么要分片?一般是数据库因业务的增长产生了巨大的压力情况下,开始是主从,然后是缓存,分区,均衡等,最后才会考虑的方案分片(一般是对主写入库进行操作),但,有经验的架构师应该从一开始就考虑到这种可能性,随着业务的增长可以线性的横向扩展数据库系统。在应用层开发时,预先考虑到以前架构要适应分片时的改造的可能性,在各方面进行优化,如sql尽量少用Join等。对于一量负载的网站,没必要很早就引入分片功能,这样反而使得业务逻辑更加复杂。 本文尝试给出一种用PHP实现的设计方案,shard_id的结构是64bit,10bit sharid系统+10bit为类型id+10子类型id+34bit自增id。关于分片设计的更多知识请参考我的其它文章。 <?php /** * 生产数据库分区id * * 描述 * * @package api * @author xxx * @copyright Copyright (c) 2014, xx.im. * @since Version 1.0 * @filesource * * @property database2 $database */ define( 'TOTAL_SHARD_NUM', 1 ); class shard { var $database = null; function shard() { $this->database =

MySQL sharding approaches?

只愿长相守 提交于 2019-11-26 21:14:50
What is the best approach for Sharding MySQL tables. The approaches I can think of are : Application Level sharding? Sharding at MySQL proxy layer? Central lookup server for sharding? Do you know of any interesting projects or tools in this area? The best approach for sharding MySQL tables to not do it unless it is totally unavoidable to do it. When you are writing an application, you usually want to do so in a way that maximizes velocity, developer speed. You optimize for latency (time until the answer is ready) or throughput (number of answers per time unit) only when necessary. You