Sharding and ID generation as instagram

大兔子大兔子 提交于 2020-01-03 01:52:06

问题


My question is regarding ID generation for sharded environment. I am following the same steps as instagram did for generating unique ids. I have a few question on the implementation of this id generation in MySQL.

This is how the ID is being generated (This is a PGQL stored procedure.)

CREATE OR REPLACE FUNCTION insta5.next_id(OUT result bigint) AS $$
DECLARE
    our_epoch bigint := 1314220021721;
    seq_id bigint;
    now_millis bigint;
    shard_id int := 5;
BEGIN
    SELECT nextval('insta5.table_id_seq') %% 1024 INTO seq_id;

    SELECT FLOOR(EXTRACT(EPOCH FROM clock_timestamp()) * 1000) INTO now_millis;
    result := (now_millis - our_epoch) << 23;
    result := result | (shard_id << 10);
    result := result | (seq_id);
END;
$$ LANGUAGE PLPGSQL;



CREATE TABLE insta5.our_table (
    "id" bigint NOT NULL DEFAULT insta5.next_id(),
    ...rest of table schema...
)

MY question is how can I do that in MySQL (equivalent code). This stored procedure should be called from a query to get the next id.

My other question is regarding querying the shards. From what I understand they use logical shards that are maps to actual servers. If they decide to map a logical shard to a new server, this means that they have to query two servers at the same time to aggregate the results, because before they assigned a new server for the same logical shard, some data was added to the older server. I just wanted to know if there's a better way to use logical shards and having the option to query a single server where the data resides instead of querying all the servers that are and were belong to that logical shard?

Thanks.


回答1:


This is postgres equivalent

Specify TABLE_SCHEMA and MASKTABLE below

I am creating a MASKTABLE with a dummy first entry

DELIMITER $$
CREATE OR REPLACE FUNCTION generate_next_id() RETURNS bigint NOT DETERMINISTIC
MAIN: BEGIN
DECLARE our_epoch bigint;
DECLARE seq_id bigint;
DECLARE now_millis bigint;
DECLARE shard_id int;
DECLARE param bigint ;
SET @our_epoch = 1568873367231;
SET @shard_id = 1;
SELECT AUTO_INCREMENT FROM information_schema.TABLES WHERE TABLE_SCHEMA = "SCHEMANAME" AND TABLE_NAME = "MASKTABLE" into @seq_id;
SELECT FLOOR(UNIX_TIMESTAMP()) * 1000 into @now_millis;
SELECT (@now_millis - @our_epoch) << 23 into @param;
SELECT @param | (@shard_id <<10) into @param; 
select @param | (@seq_id) into @param; 
RETURN @param;
END MAIN;$$ 
DELIMITER ;

Usage

select generate_next_id()

You can use it in trigger like

CREATE TRIGGER trigger_name
BEFORE INSERT ON TableName 
FOR EACH ROW
SET new.id = generate_next_id();



回答2:


By the look of the code it looks like you just need to replicate sequence, you can do it by creating table in MySQL with AUTO_INCREMENT and use it for generating identity numbers.




回答3:


At ScaleBase we found a nice way to provide support for autoincrements, through the variable: SET @@auto_increment_increment=4;.

Just make sure each shard has a starting offset and you're free to go. It's straight, simple, compatible with existing MySQL and stuff.




回答4:


Regarding the other question, all data for one logical shard only live on one server in production at a time (without taking replication into account). When they're running out of capacity of their existing servers, they duplicate data in each server to a follower, and when the new server is ready (at this point both the original server and the new server contain all data for a specific logical shard), they start reading all data for half of the shards from the new server, and the other half stays in the original server. See e.g. http://www.craigkerstiens.com/2012/11/30/sharding-your-database/ for illustration.




回答5:


I have a question out for the conversion routine I wrote, but it does appear to work!

Can AUTO_INCREMENT be safely used in a BEFORE TRIGGER in MySQL

Once I get the question answered, I'll update this answer as well.

MySQL Version:

CREATE TRIGGER shard_insert BEFORE INSERT ON tablename
FOR EACH ROW BEGIN

DECLARE seq_id BIGINT;
DECLARE now_millis BIGINT;
DECLARE our_epoch BIGINT DEFAULT 1314220021721;
DECLARE shard_id INT DEFAULT 1;

SET now_millis = (SELECT UNIX_TIMESTAMP());
SET seq_id = (SELECT AUTO_INCREMENT FROM information_schema.TABLES WHERE TABLE_SCHEMA = "dbname" AND TABLE_NAME =     "tablename");
SET NEW.id = (SELECT ((now_millis - our_epoch) << 23) | (shard_id << 10) | (SELECT MOD(seq_id, 1024)));
END



回答6:


Implementing the exact same id generation logic with a MySQL stored procedure is not possible. However, this can be done using a MySQL UDF.

Here is a UDF that recreates instagrams id generator with a few changes. Modifying it to work exactly like the instagram id generator would involve updating the MAX_* constants and the bit shifting logic in next_shard_id and shard_id_to_ms.



来源:https://stackoverflow.com/questions/12997895/sharding-and-id-generation-as-instagram

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!