Best practice for storing tags in a database?

孤人 提交于 2019-11-30 10:39:36

问题


I developed a site that uses tags (key words) in order to categorize photographs. Right now, what I have in my MySQL database is a table with the following structure:

image_id (int)
tag      (varchar(32))

Every time someone tags an image (if the tag is valid and has enough votes) it's added to the database. I think that this isn't the optimal way of doing things since now that I have 5000+ images with tags, the tags table has over 40000 entries. I fear that this will begin to affect performance (if it's not already affecting it).

I considered this other structure thinking that it'd be faster to fetch the tags associated to a particular image but then it looks horrible for when I want to get all the tags, or the most popular one for instance:

image_id (int)
tags     (text) //comma delimited list of tags for the image

Is there a correct way of doing this or are both ways more or less the same? Thoughts?


回答1:


Use a many-to-many table to link a TAG record to an IMAGE record:

IMAGE

DROP TABLE IF EXISTS `example`.`image`;
CREATE TABLE  `example`.`image` (
  `image_id` int(10) unsigned NOT NULL auto_increment,
  PRIMARY KEY  (`image_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

TAG

DROP TABLE IF EXISTS `example`.`tag`;
CREATE TABLE  `example`.`tag` (
 `tag_id` int(10) unsigned NOT NULL auto_increment,
 `description` varchar(45) NOT NULL default '',
 PRIMARY KEY  (`tag_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

IMAGE_TAG_MAP

DROP TABLE IF EXISTS `example`.`image_tag_map`;
CREATE TABLE  `example`.`image_tag_map` (
 `image_id` int(10) unsigned NOT NULL default '0',
 `tag_id` int(10) unsigned NOT NULL default '0',
 PRIMARY KEY  (`image_id`,`tag_id`),
 KEY `tag_fk` (`tag_id`),
 CONSTRAINT `image_fk` FOREIGN KEY (`image_id`) REFERENCES `image` (`image_id`),
 CONSTRAINT `tag_fk` FOREIGN KEY (`tag_id`) REFERENCES `tag` (`tag_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;



回答2:


You can make a tags table which is just an id and tag with a unique constraint on tag and then photo_tags table which has tag_id and photo_id. Insert a tag into the tags table only if it doesn't already exist.

Then you will be querying by a pk instead of varchar text comparison when doing queries like how many photos are tagged with a certain tag.




回答3:


In multi tag search query you will have to hit every tag that is requested. Hence image tag set I has to be a superset of the request tag set U.

I >= U

To implement this complex comparison in SQL is a bit of challenge as each of the image has to be qualified individually. Given that tags are unique set per image:

SELECT i.* FROM images AS i WHERE {n} = (
  SELECT COUNT(*) 
  FROM image_tags AS t 
  WHERE t.image_id = i.image_id
    AND t.tag IN ({tag1}, {tag2}, ... {tagn})
)

Schema:

CREATE TABLE images (
  image_id varchar NOT NULL,
  PRIMARY KEY (image_id)
)

CREATE TABLE image_tags (
  image_id varchar NOT NULL,
  tag varchar NOT NULL,
  PRIMARY KEY (image_id, tag)
)


来源:https://stackoverflow.com/questions/3508207/best-practice-for-storing-tags-in-a-database

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!