I have text in my database in Markdown format. I\'d like to extract links and count the number of matching links I have. I can get a listing of text blocks that contain link
Assuming the actual data is no more complex than the stated examples, this should work without resorting to RegEx:
DECLARE @posts TABLE
(
post_id INT NOT NULL IDENTITY(1, 1),
post_text NVARCHAR(4000) NOT NULL,
body NVARCHAR(2048) NULL
);
INSERT INTO @posts (post_text, body) VALUES (N'first',
N'Visit [Google](http://google.com)');
INSERT INTO @posts (post_text, body) VALUES (N'second',
N'Get an [iPhone](http://www.apple.com)');
INSERT INTO @posts (post_text, body) VALUES (N'third',
N'[Example](http://example.com)');
INSERT INTO @posts (post_text, body) VALUES (N'fourth',
N'This is a message');
INSERT INTO @posts (post_text, body) VALUES (N'fifth',
N'I like cookies (chocolate chip)');
INSERT INTO @posts (post_text, body) VALUES (N'sixth',
N'[Frankie] says ''Relax''');
INSERT INTO @posts (post_text, body) VALUES (N'seventh',
NULL);
SELECT p.post_text,
SUBSTRING(
p.body,
CHARINDEX(N'](', p.body) + 2,
CHARINDEX(N')', p.body) - (CHARINDEX(N'](', p.body) + 2)
) AS [URL]
FROM @posts p
WHERE p.body like '%\[%](http%)%' ESCAPE '\';
Output:
post_text URL
first http://google.com
second http://www.apple.com
third http://example.com
PS:
If you really want to use Regular Expressions, they can only be done via SQLCLR. You can write your own or download pre-done libraries. I wrote one such library, SQL#, that has a Free version that includes the RegEx functions. But those should only be used if a T-SQL solution cannot be found, which so far is not the case here.