问题
Loooking for a way to retrieve community from a large dataset I came across an article about the algorithm which seems to be apropriate for large datasets. Anyway the data is stored two tables: users (nodes) and connections and I would like to retrieve the communities by pure sql queries without help of custom applications (I'm using SQL Server 2008).
The algorithm to retrieve the cliques is the following:
Read the graph G
Generate set neighbors(v) for every vertex of G
for each vertex v of G
call recursive_find_cliques(v, neighbors(v))
end for
Function recursive_find_cliques(x, n)
for each vertex t ∈ n by ascending order calculate set sigma
if sigma is not empty
extend x with t
call recursive_find_cliques(x, sigma)
end if
end for
where sigma is the set of vertices that could constitute triangles with v and its neighbors.
I already created a stored procedure which returns a table of neighbors of selected node but so far I haven't delat with sql functions and advanced queries so the question is the following:
Does anyone know how to rewrite the algorithm above in sql in order to get the set of cliques? As the question might be a little abstract, I may point out that the main problem is to create a recursive function (recursive_find_cliques(x, n)) which takes a table (n) as an argument).
Thank you!
EDIT:
Here is sthe stored procedure created so far:
CREATE PROCEDURE [dbo].[Peamc_Test]
AS
BEGIN
SET XACT_ABORT ON
BEGIN TRAN
SET NOCOUNT ON;
CREATE TABLE #Users
(
UserId int NOT NULL,
userLabel varchar(50) PRIMARY KEY NOT NULL,
Observed bit NOT NULL
)
CREATE TABLE #Neighbors
(
UserId int NOT NULL,
userLabel varchar(50) NOT NULL PRIMARY KEY,
Retrieved bit NOT NULL
)
CREATE TABLE #ConnectedVertices
(
UserId int NOT NULL,
userLabel varchar(50) NOT NULL PRIMARY KEY,
)
CREATE TABLE #Cliques
(
CliqueId int NOT NULL,
UserId varchar(50) NOT NULL,
)
DECLARE @UsersCount int
DECLARE @ii int
DECLARE @User varchar(50)
DECLARE @NeighborsCount int
INSERT INTO #Users(UserId, userLabel, Observed) SELECT user_id, userLabel, 0 FROM dbo.test_users WHERE user_id IS NOT NULL
SELECT @UsersCount = COUNT(*) FROM #Users
SELECT @ii = 1
WHILE @ii <= @UsersCount
BEGIN
--select user
SELECT TOP 1 @User = userLabel FROM #Users WHERE Observed = 0 ORDER BY UserId
UPDATE #Users SET Observed = 1 WHERE userLabel = @User
--Get user's neighbors
DELETE FROM #Neighbors
INSERT INTO #Neighbors(UserId, userLabel, Retrieved)
SELECT u.user_id, t2.neighbor, 0 FROM ( SELECT CALLING_NEIGHBORS.neighbor FROM ( SELECT mc.calling_party AS neighbor FROM monthly_connections_test mc WHERE mc.called_party = @User) AS CALLING_NEIGHBORS INNER JOIN (SELECT mc.called_party AS neighbor FROM monthly_connections_test mc WHERE mc.calling_party = @User) AS CALLED_NEIGHBORS ON CALLING_NEIGHBORS.neighbor = CALLED_NEIGHBORS.neighbor) AS t2 INNER JOIN test_users u ON t2.neighbor = u.userLabel
SELECT @NeighborsCount = COUNT(*) FROM #Neighbors
SELECT @ii = @ii + 1
--HERE the function recursive_find_cliques has to search for cliques and insert the found ones in #cliques
END
SELECT * FROM #Cliques
END
It does'not return anything yet as it is not finished. It though retrieves all neighbors for the currently selected nodes and the next step is to implement recursive_find_cliques function.
回答1:
I realised that my first answer only works when each clique has at least one user who is not referred to by any others in that clique. In other words, closed cliques like A-B, B-C, C-A will not be found.
Here is a solution which solves this. Again we have users with IDs, now 1..20. There are several cases of neighbouring relations that need to be handled:
Compared to the simple case, it is harder to find a unique starter for each clique. We achieve this with a little sleight of hand:
Reorder the neighbours so that for all references A-B, A is less than B, ignoring any A=B.
From these, remove any A-X references if there are any X-A, which could cause a loop. This will never remove references to A completely because X-A remains and A-X will be added in the recursion.
The resultant set are the 'starting' users and we use them to prime the CTE:
-- Get all pairs, where UserA < UserB, dropping any A=B or B=A
WITH LRNeighbours(A, B) AS (
SELECT
Neighbours.UserA, Neighbours.UserB
FROM
Neighbours
WHERE
Neighbours.UserA < Neighbours.UserB
UNION ALL
SELECT DISTINCT
Neighbours.UserB, Neighbours.UserA
FROM
Neighbours
WHERE
Neighbours.UserA > Neighbours.UserB
),
-- Isolate those that are not referred to by a higher numbered key
Starters(userid) AS (
SELECT DISTINCT
A
FROM
LRNeighbours
WHERE
A NOT IN (
SELECT
B
FROM
LRNeighbours
)
),
-- The recursive Common Table Expression
cliques(userid, clique) AS (
-- Number starters 1..N
SELECT
userid, ROW_NUMBER() OVER(ORDER BY userid) AS clique
FROM
Starters
UNION ALL
-- Recurse, adding users referred by siblings, avoiding starters themselves
SELECT
B, clique
FROM
LRNeighbours INNER JOIN
cliques ON
LRNeighbours.A = cliques.userid
AND B NOT IN (
SELECT
userid
FROM
starters
)
)
SELECT DISTINCT
clique, userid
FROM
cliques
ORDER BY
clique, userid
Results:
1 1
1 2
2 3
2 4
3 5
3 6
3 7
3 8
4 9
4 10
4 11
4 12
4 13
5 14
5 15
5 16
5 17
5 18
5 19
5 20
回答2:
CREATE TABLE [dbo].[Users](
[UserID] [int] IDENTITY(1,1) NOT NULL,
[UserName] [varchar](50) NOT NULL
) ON [PRIMARY]
CREATE TABLE [dbo].[Neighbours](
[UserA] [int] NOT NULL,
[UserB] [int] NOT NULL
) ON [PRIMARY]
Users populated with 1..8 and Neighbours
UserA UserB
1 2
2 3
4 5
4 6
5 7
7 8
Then:
WITH cliques(userid, clique) AS (
SELECT
userid, ROW_NUMBER() OVER(ORDER BY userid) AS clique
FROM
Users
WHERE
users.UserID NOT IN (
SELECT
UserB
FROM
Neighbours
)
UNION ALL
SELECT
Neighbours.UserB, clique
FROM
neighbours
INNER JOIN cliques
ON Neighbours.UserA = cliques.userid
)
SELECT
clique, cliques.userid
FROM
cliques
ORDER BY
clique, userid
Result:
clique userid
1 1
1 2
1 3
2 4
2 5
2 6
2 7
2 8
See : Recursive Queries Using Common Table Expressions
回答3:
I've added a two LABELS and two GOTO statements
CREATE PROCEDURE [dbo].[Peamc_Test]
AS
BEGIN
SET XACT_ABORT ON
BEGIN TRAN
SET NOCOUNT ON;
CREATE TABLE #Users
(
UserId int NOT NULL,
userLabel varchar(50) PRIMARY KEY NOT NULL,
Observed bit NOT NULL
)
CREATE TABLE #Neighbors
(
UserId int NOT NULL,
userLabel varchar(50) NOT NULL PRIMARY KEY,
Retrieved bit NOT NULL
)
CREATE TABLE #ConnectedVertices
(
UserId int NOT NULL,
userLabel varchar(50) NOT NULL PRIMARY KEY,
)
CREATE TABLE #Cliques
(
CliqueId int NOT NULL,
UserId varchar(50) NOT NULL,
)
DECLARE @UsersCount int
DECLARE @ii int
DECLARE @User varchar(50)
DECLARE @NeighborsCount int
INSERT INTO #Users(UserId, userLabel, Observed) SELECT user_id, userLabel, 0 FROM dbo.test_users WHERE user_id IS NOT NULL
SELECT @UsersCount = COUNT(*) FROM #Users
SELECT @ii = 1
WHILE @ii <= @UsersCount
BEGIN
--select user
SELECT TOP 1 @User = userLabel FROM #Users WHERE Observed = 0 ORDER BY UserId
UPDATE #Users SET Observed = 1 WHERE userLabel = @User
--Get user's neighbors
DELETE FROM #Neighbors
INSERT INTO #Neighbors(UserId, userLabel, Retrieved)
SELECT u.user_id, t2.neighbor, 0 FROM ( SELECT CALLING_NEIGHBORS.neighbor FROM ( SELECT mc.calling_party AS neighbor FROM monthly_connections_test mc WHERE mc.called_party = @User) AS CALLING_NEIGHBORS INNER JOIN (SELECT mc.called_party AS neighbor FROM monthly_connections_test mc WHERE mc.calling_party = @User) AS CALLED_NEIGHBORS ON CALLING_NEIGHBORS.neighbor = CALLED_NEIGHBORS.neighbor) AS t2 INNER JOIN test_users u ON t2.neighbor = u.userLabel
SELECT @NeighborsCount = COUNT(*) FROM #Neighbors
SELECT @ii = @ii + 1
GOTO Clique_Find
--HERE the function recursive_find_cliques has to search for cliques and insert the found ones in #cliques
--------------------
Clique_Return:
--------------------
END
SELECT * FROM #Cliques
END
--------------------
Clique_Find:
--------------------
-- Code goes here
-- Code goes here
-- Code goes here
-- Code goes here
-- Code goes here
-- Code goes here
GOTO Clique_Return
来源:https://stackoverflow.com/questions/4143683/how-to-create-a-stored-procedure-to-find-cliques-in-the-table-of-connections-bet