Find the start and end of a redirect chain

吃可爱长大的小学妹 提交于 2019-12-23 05:13:58

问题


I have a table of URL redirects in a SQL server table, each redirect has an ID, a FromURL and a ToURL field.

I've been asked to find where we have a chain of redirects in the table so that we can replace them with a single redirect so that users are only redirected once rather than multiple times.

An example of the table is below:

As you can see, if a user visits URL A, they will be redirected to B, then from B to C then from C to D we'd like to replace this with a single redirect from A to D to speed up the page load.

I thought I might be able to do this without cursors with a recursive CTE but I got completely stuck with this, the best I managed to to was find the start of each chain with the following:

SELECT  r.ID ,
        r.FromURL ,
        r.ToURL
FROM    dbo.redirect r
WHERE   fromURL NOT IN ( SELECT ToURL
                         FROM   dbo.redirect r2 )

This gives me the start of the chains (or the ones that aren't in a chain at all) by selecting any records where the FromURL hasn't been redirected by any other redirect. When I tried following through some of the recursive CTE examples, all I ended up with was junk data or hitting the recursion limit.

Ideally what I'd like to get out of this is data similar to the following:

As you can see, the chains of redirects have been replaced with a single one, so every level in the hierarchy now goes directly to the end of the chain.

I'm just a DBA who agreed to do something for our web team that I have now found completely out of my ability with T-SQL so if anyone can help me out that would be most appreciated.


回答1:


The general solution can be found searching for: "Directed Acyclic Graph", "Traversal", "SQL". hansolav.net/sql/graphs.html#topologicalsorting has some good info.

If you need a fast answer, here's a quick-and-dirty method. It's not efficient, and it needs an acyclic input, but it's readable to someone not familiar with sql.

SELECT id, FromUrl, ToUrl
INTO #temp
FROM dbo.redirect

WHILE @@ROWCOUNT > 0
BEGIN
  UPDATE cur
  SET ToUrl = nxt.ToURL
  FROM #temp cur
  INNER JOIN #temp nxt ON (cur.ToURL = nxt.FromURL)
END

SELECT * FROM #temp

Alternatively, with a recursive CTE:

;WITH cte AS (
  SELECT 1 as redirect_count, id, FromURL, ToUrl
  FROM dbo.redirect
  UNION ALL
  SELECT redirect_count + 1, cur.id, cur.FromURL, nxt.ToURL
  FROM cte cur
  INNER JOIN @t nxt ON (cur.ToURL = nxt.FromURL)
)
SELECT
  t1.id, t2.FromUrl, t2.ToUrl
FROM dbo.redirect t1
CROSS APPLY (
  SELECT TOP 1 FromUrl, ToUrl
  FROM cte
  WHERE id = t1.id
  ORDER BY redirect_count DESC
) t2


来源:https://stackoverflow.com/questions/21117854/find-the-start-and-end-of-a-redirect-chain

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!