What is regexp_replace equivalent in SQL Server

余生颓废 提交于 2021-02-10 14:18:41

问题


I have this piece of code in Oracle which I need to convert into SQL Server to get the same behavior. I have used the REPLACE function. It seems to be working but I just wanted to make sure.

REGEXP_REPLACE(
                phonenumber, 
               '([[:digit:]]{3})([[:digit:]]{3})([[:digit:]]{4})', 
               '(\1)\2-\3'
               ) phonenumber

回答1:


SQL Server does not have native regex support. You would need to use CLR (or as @Lukasz Szozda points out in the comments one of the newer Language Extensions) .

If I have understood the regex correctly though it matches strings of 10 digits and assigns the first 3 to group 1, second 3 to group 2, and last 4 to group 3 and then uses the back references in the expression (\1)\2-\3

You can use built in string functions to do this as below

SELECT CASE
         WHEN phonenumber LIKE REPLICATE('[0-9]', 10)
           THEN  FORMATMESSAGE('(%s)%s-%s', 
                      LEFT(phonenumber, 3),
                      SUBSTRING(phonenumber, 4, 3),
                      RIGHT(phonenumber, 4))
         ELSE phonenumber
       END



回答2:


You can write SQL function using CLR, that will wrap standard dotnet regex. I have wrote this and you can use it there. It will look this:

DECLARE @SourceText NVARCHAR(MAX) = N'My first line <br /> My second line';
DECLARE @RegexPattern NVARCHAR(MAX) = N'([<]br\s*/[>])';
DECLARE @Replacement NVARCHAR(MAX) = N''
DECLARE @IsCaseSensitive BIT = 0;

SELECT regex.Replace(@SourceText, @RegexPattern, @Replacement, @IsCaseSensitive);



回答3:


As Martin said in his answer, SQL Server does not have built-in RegEx functionality (and while it has not been suggested here, just to be clear: no, the [...] wildcard of LIKE and PATINDEX is not RegEx). If your data has little to no variation then yes, you can use some combination of T-SQL functions: REPLACE, SUBSTRING, LEFT, RIGHT, CHARINDEX, PATINDEX, FORMATMESSAGE, CONCAT, and maybe one or two others.

However, if the data / input has even a moderate level of complexity, then the built-in T-SQL functions will be at best be cumbersome, and at worst useless. In such cases it's possible to do actual RegEx via SQLCLR (as long as you aren't using Azure SQL Database Single DB or SQL Server 2017+ via AWS RDS), which is (restricted) .NET code running within SQL Server. You can either code your own / find examples here on S.O. or elsewhere, or try a pre-done library such as the one I created, SQL# (SQLsharp), the Free version of which contains several RegEx functions. Please note that SQLCLR, being .NET, is not a POSIX-based RegEx, and hence does not use POSIX character classes (meaning: you will need to use \d for "digits" instead of [:digit:]).

The level of complexity needed in this particular situation is unclear as the example code in the question implies that the data is simple and uniform (i.e. 1112223333) but the example data shown in a comment on the question appears to indicate that there might be dashes and/or spaces in the data (i.e. xxx- xxx xxxx).

If the data truly is uniform, then stick with the pure T-SQL solution provided by @MartinSmith. But, if the data is of sufficient complexity, then please consider the RegEx example below, using a SQLCLR function found in the Free version of my SQL# library (as mentioned earlier), that easily handles the 3 variations of input data and more:

SELECT SQL#.RegEx_Replace4k(tmp.phone,
                            N'\(?(\d{3})\)?[ .-]*(\d{3})[ .-]*(\d{4})', N'($1)$2-$3',
                            -1,   -- count (-1 == unlimited)
                            1,    -- start at
                            N'')  -- RegEx options
FROM   (VALUES (N'8885551212'),
               (N'123- 456 7890'),
               (N'(777) 555- 4653')
       ) tmp([phone]);

returns:

(888)555-1212
(123)456-7890
(777)555-4653

The RegEx pattern allows for:

  • 0 or 1 (
  • 3 decimal digits
  • 0 or 1 )
  • 0 or more of , ., or -
  • 3 decimal digits
  • 0 or more of , ., or -
  • 4 decimal digits

NOTE

It was mentioned that the newer Language Extensions might be a better choice than SQLCLR. Language Extensions allow calling R / Python / Java code, hosted outside of SQL Server, via the sp_execute_external_script stored procedure. As the Tutorial: Search for a string using regular expressions (regex) in Java page shows, external scripts are actually not a good choice for many / most uses of RegEx in SQL Server. The main problems are:

  1. Unlike with SQLCLR, the only interface for external scripts is a stored procedure. This means that you can't use any of that functionality inline in a query (SELECT, WHERE, etc).
  2. With external scripts, you pass in the query, work on the results in the external language, and pass back a static result set. This means that compiled code now has to be more specialized (i.e. tightly-coupled) to the particular usage. Changing how the query uses RegEx and/or what columns are returned now requires editing, compiling, testing, and deploying the R / Python / Java code in addition to (and coordinated with!) the T-SQL changes.

I'm sure external scripts are absolutely wonderful, and a better choice than SQLCLR, in certain scenarios. But they certainly do not lend themselves well to the highly varied, and often ad hoc, nature of how RegEx is used (like many / most other functions).



来源:https://stackoverflow.com/questions/61488458/what-is-regexp-replace-equivalent-in-sql-server

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!