sharding

Database sharding vs partitioning

自闭症网瘾萝莉.ら 提交于 2020-04-07 10:45:08
问题 I have been reading about scalable architectures recently. In that context, two words that keep on showing up with regards to databases are sharding and partitioning . I looked up descriptions but still ended up confused. Could the experts at stackoverflow help me get the basics right? What is the difference between sharding and partitioning ? Is it true that 'all sharded databases are essentially partitioned (over different nodes), but all partitioned databases are not necessarily sharded' ?

从jredis中学习一致性hash算法

风格不统一 提交于 2020-04-06 21:50:10
jredis是redis的java客户端,通过sharde实现负载路由, 一直很好奇jredis的sharde如何实现,翻开jredis源码研究了一番,所谓sharde其实就是一致性hash算法。其实,通过其源码可以看出一致性hash算法实现还是比较简单的。主要实现类是redis.clients.util.Sharded<R, S>,关键的地方添加了注释: public class Sharded<R, S extends ShardInfo<R>> { //S类封装了redis节点的信息 ,如name、权重 public static final int DEFAULT_WEIGHT = 1;//默认权重为1 private TreeMap<Long, S> nodes;//存放虚拟节点 private final Hashing algo;//hash算法 ...... public Sharded(List<S> shards, Hashing algo, Pattern tagPattern) { this.algo = algo; this.tagPattern = tagPattern; initialize(shards); } private void initialize(List<S> shards) { nodes = new TreeMap<Long, S>

MySQL Sharding 详解

六眼飞鱼酱① 提交于 2020-02-29 08:49:40
背景 我们知道,当数据库中的数据量越来越大时,不论是读还是写,压力都会变得越来越大。采用MySQL Replication多mater多salve方案,在上层做负载均衡,虽然能够一定程度上缓解压力。但是当一张表中的数据变得非常庞大时,压力还是非常大的。试想,如果一张表中的数据量达到了千万甚至上亿级别的时候,不管是建索引,优化缓存等,都会面临巨大的性能压力。 定义 数据sharding,也称作数据切分,或分区。是通过某种条件,把同一个数据库中的数据分散到多个数据库或者多台机器上,以减小单台机器压力。 分类 数据分区根据切分规则,可以分为两类: 1.垂直切分 数据的垂直切分,也可以称之为 纵向切分。将数据库想象成为由很多个一大块一大块的“数据块”(表)组成,我们垂直的将这些“数据块”切开,然后将他们分散到多台数据库主机上面。这样的切分方法就是一个垂直(纵向)的数据切分。以表为单位,把不同的表分散到不同的数据库或者主机上。规则简单,实施方便,适合业务之间耦合度低的系统。 垂直切分的优点: (1) 数据库的拆分简单明了,拆分规则明确; (2)应用程序模块清晰明确,整合容易; (3)数据维护方便易行,容易定位; 垂直切分的缺点: (1)部分表关联无法在数据库级别完成,需要在程序中完成; (2)对于访问极其频繁且数据量大的表仍然存在性能瓶颈,不一定能满足要求; (3)业务处理相对更为复杂;

How to define sharding range for each shard in Mongo?

与世无争的帅哥 提交于 2020-01-22 19:50:28
问题 let say, the document is { x:Number } and I have 3 shards. Instead of autosharding, can I define specifically shard1 only contains data x<0, shard2 only contains data 0 =< x =< 1000 , and shard 3 is 1000 回答1: You can. It's possible to pre-split chunks manually, it's described here: http://www.mongodb.org/display/DOCS/Splitting+Chunks Think carefully about how you split your chunks. If you do it badly you can get lots of performance problems, but if you know enough about your keys you can gain

Does Cassandra support sharding?

℡╲_俬逩灬. 提交于 2020-01-22 13:20:23
问题 Does Apache Cassandra support sharding? Apologize that this question must seem trivial, but I cannot seem to find the answer. I have read that Cassandra was partially modeled after GAE's Big Table which shards on a massive scale. But most of the documentation I'm currently finding on Cassandra seems to imply that Cassandra does not partition data horizontally across multiple machines, but rather supports many many duplicate machines. This would imply that Cassandra is a good fit high

Does Cassandra support sharding?

亡梦爱人 提交于 2020-01-22 13:20:00
问题 Does Apache Cassandra support sharding? Apologize that this question must seem trivial, but I cannot seem to find the answer. I have read that Cassandra was partially modeled after GAE's Big Table which shards on a massive scale. But most of the documentation I'm currently finding on Cassandra seems to imply that Cassandra does not partition data horizontally across multiple machines, but rather supports many many duplicate machines. This would imply that Cassandra is a good fit high

Does Cassandra support sharding?

﹥>﹥吖頭↗ 提交于 2020-01-22 13:19:48
问题 Does Apache Cassandra support sharding? Apologize that this question must seem trivial, but I cannot seem to find the answer. I have read that Cassandra was partially modeled after GAE's Big Table which shards on a massive scale. But most of the documentation I'm currently finding on Cassandra seems to imply that Cassandra does not partition data horizontally across multiple machines, but rather supports many many duplicate machines. This would imply that Cassandra is a good fit high

Dynamic database routing in Django

好久不见. 提交于 2020-01-14 04:09:11
问题 In my database, I have a Customer table defined in my database that all other tables are foreign keyed on. class Customer(models.Model): ... class TableA(models.Model): Customer = models.ForeignKey(Customer) ... class TableB(models.Model): Customer = models.ForeignKey(Customer) ... I'm trying to implement a database router that determines the database to connect to based on the primary key of the Customer table. For instance, id s in the range 1 - 100 will connect to Database A, id s in the

Mysql 分区 分表相关总结之方案选择

江枫思渺然 提交于 2020-01-10 10:55:00
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> [TOC] ##引述 前段时间项目需要,一直在研究mysql sharding,看了一些这方面的资料,也亲自实验测试了一些数据。在此,做个概括的笔记,方便以后回顾知识,其实大多是借鉴网络上各位前辈的,然后抱着学习态度去实践,积累属于自己的东西。 ##拆分策略选择 其实拆分很灵活,有的是 垂直切分 ,将一个库拆成两个或多个,将有相关联的表放在一个库里。有的是 水平切分 将数据量大的表按照一定逻辑进行拆分。个人感觉垂直切分的相对来说缓解了IO的瓶颈,而水平切分,目的是减轻了单个表或某些表读写的压力。 我们项目根据个人需求,采用的水平切分,没有去分库。之后要看看需要采用何种的切分了。 了解到的有: 分表、分区、MERGE引擎分表。 ###MERGE引擎分表 ####简介 先介绍merge表,此方法 只 适用于MyISAM。我数据库的表都是采用InnoDB引擎的,所以首先就被pass了,但是还是在这里简单介绍下吧。 mysql 5.1 手册里的说的 An alternative to a MERGE table is a partitioned table, which stores partitions of a single table in separate files. Partitioning enables

Sharding and ID generation as instagram

大兔子大兔子 提交于 2020-01-03 01:52:06
问题 My question is regarding ID generation for sharded environment. I am following the same steps as instagram did for generating unique ids. I have a few question on the implementation of this id generation in MySQL. This is how the ID is being generated (This is a PGQL stored procedure.) CREATE OR REPLACE FUNCTION insta5.next_id(OUT result bigint) AS $$ DECLARE our_epoch bigint := 1314220021721; seq_id bigint; now_millis bigint; shard_id int := 5; BEGIN SELECT nextval('insta5.table_id_seq') %%