Select All Posts With All Their Tags

There are many questions about this, but not enough definitive answers, especially using SQL Server which I am in this case.

I have 3 tables to represent blog posts and tags associated with it. I want to run a query that will get all the posts and each posts tags with it - I need both the tag id and name. The result needs to be easily serializable into my C# classes. I am also trying to use Dapper.net but that is not the most important part.

Obviously many websites do this, I want to know the best way and how it should be done in the real world? I could get all the posts, then run multiple queries for each one afterwards that would return the tags for each post. But surely there has to be a better way?

If I was just attempting to get one post then I would just write multiple selects. One for the post and one for the tags. But I want all of the posts. Is there a way of doing this without replicating the post information for each tag row that is returned?

If the tags for each post are serialized into a column that has them comma separated for example, how would one be able to get both the id and the name? Encode the string?

SELECT * FROM dbo.Posts    
SELECT * FROM dbo.Tags
SELECT * FROM dbo.PostTags

Posts
Id  Title       Content
===============================
1   First Post  First Content
3   Second      Second Content

Tags
Id  Name
============
1   C#
2   SQL
3   IIS
4   Steam
5   OpenID

PostTags
PostId  TagId
=============
1       1
1       2
3       3
3       4

Simply joining the tables with the following query:

SELECT p.*, t.Name
FROM dbo.Posts p
LEFT JOIN dbo.PostTags pt ON p.id = pt.PostId
LEFT JOIN dbo.Tags t ON t.id = pt.TagId

Gives much redundant data by repeating the post content for each tag that is associated with it:

Id  Title       Content         Name
======================================
1   First Post  First Post      C#
1   First Post  First Post      SQL
3   Second Post Second Content  IIS
3   Second Post Second Content  Steam

This being purely an exercise, let me preface this by saying that most likely the amount of data being duplicated isn't a big deal. Although if the posts are very large in size and there are lots of them, it does start to make more sense to avoid duplication.

Further, using C# Linq-to-Sql or Entity Framework, the object relationships will be worked out for you and your Post entity will have a List<Tag> property that you can access.

However if you want to roll your own type of thing, one option that involves just one DB round trip and no duplication of data is to write a stored proc that gets you back 2 recordsets (2 separate select statements) - one with Post content, and one with Tag content.

It would then be pretty simple to create a C# class that represents a Post and just has a List<Tag> and pull it from the stored proc results.

Create Procedure GetPostTags
As

-- We will use the GotTags column here to loop through and get tabs later
Declare @Posts Table (
    PostID varchar(50), 
    PostTitle varchar(50), 
    PostContent varchar(50),
    GotTags bit default 0
)

/* Assuming you care about the ID's, this will get you all of 
   the tags without duplicating any post content */
Declare @PostTags Table (
    PostID int,
    TagID int,
    TagName varchar(50)
)

-- Populate posts from the main table
Insert Into @Posts (PostID, PostTitle, PostContent)
Select * From Posts

-- Now loop through and get the tags for each post. 
Declare @CurrentPostID int
Set @CurrentPostID = (Select Top 1 PostID From @Posts Where GotTags = 0)
While @CurrentPostID Is Not Null
    Begin
        Insert Into @PostTags (PostId, TagID, TagName)
        Select pt.postid, pt.tagid, t.name
        From Tags t 
            Join PostTags pt
                On t.id = pt.tagid
        Where pt.postid = @CurrentPostID

        -- Set next loop
        Update @Posts Set GotTags = 1 Where PostID = @CurrentPostID
        Set @CurrentPostID = (Select Top 1 PostID From @Posts Where GotTags = 0)
    End

-- Return 2 recordsets, which are related by the PostID column found in both sets
Select * from @Posts
Select * From @PostTags

I prefer this type of solution over concatenating strings into one string and then splitting them later; it makes it easier to work with the data this way, allows to be more object oriented in C#, and lets you keep track of Tag ID's easier in case tags need to be removed or added to/from a post, you don't need to find a tag or match by name since you already have the ID.

Obviously many websites do this, I want to know the best way..

The best way: there isn't one.

and how it should be done in the real world?

Entity Framework would build out the query as you suggested and materialize the objects you need. Yes there is duplicate data, but more often then not, the duplicate data is better with related data then trying to relate the information back together again. The advantage is more readable code, and easier to query in a c# like language, with related records and change tracking (by default).

Dapper can do the same thing - A parent object with it's children objects. It's faster, but it doesn't have change tracking, and the statements aren't c# like, they are (as far as I've seen) direct SQL, which makes writing dynamic queries a much harder.

But surely there has to be a better way?

I don't know what better is. Is it more efficent, less memory overhead, less network packets/size, more maintainable, more readable?

Is there a way of doing this without replicating the post information for each tag row that is returned?

Yes, you could write a stored procedure to return multiple recordsets, materialize your objects, and wire them up manually.

This sounds like you are trying to optimize something you don't have a problem with..

I'd write a query to return multiple recordsets. I wouldn't worry about over-optimizing until you do some performance testing.

I'm not sure about Dapper's recent support for one-to-many or many-to-many queries, but you might want to check out the new features in Insight.Database 4.0. There's a pre-release in nuget now.

Check out the pre-release docs. I'd love some feedback.

https://github.com/jonwagner/Insight.Database/wiki/Proposed-4.0-Changes

来源：https://stackoverflow.com/questions/21767613/select-all-posts-with-all-their-tags

标签

sql

.net

sql-server

dapper