SQL Server: how to find the substring just after the last occurence of another substring and before the next comma

问题

I have a table in a SQL Server database which has a column ColumnStrings with strings of this kind:

"AB=ikkw0116,AC=BE D Work stations,AC=BE D stations,AC=D Allocated,AD=pnser,AD=pnsas,AD=owned,AD=increased"
"AB=ikkWA001S1,AC=BE D HD,AC=D Allocated,AD=pnser,AD=pnsas,AD=owned,AD=increased"
"AB=iksw0084,AC=Domain View,AD=pnsas,AD=owned,AD=increased"
"AB=GHRS05900263,AC=Big stations,AC=GHR,AC=BE,AD=ger,AD=eu,AD=intra"

So we have a random number of AB=, AC= and AD=.

I would like to get the value (substring placed after the equal sign and before the next coma) just after the last occurrence of AC=.

In my example, the searched values would be for those 4 strings:

"D Allocated"
"D Allocated"
"Domain View"
"BE"

I could find the position of this last occurrence with

DATALENGTH(MyTable.[ColumnStrings])-CHARINDEX(REVERSE('=AC'),REVERSE(MyTable.[ColumnStrings]))-1

but how to get the substring just after this =AC and before the next comma (or the end of the string if we don't find any comma)

回答1:

In situation like yours, a JSON-based approach is a possible option. You need to transform appropriately the input strings into a valid JSON structure - a nested JSON arrays (AB=iksw0084,AC=Domain View,AD=pnsas,AD=owned,AD=increased into [["AB","iksw0084"],["AC","Domain View"],["AD","pnsas"],["AD","owned"],["AD","increased"]). Then you need to parse this JSON with OPENJSON() and default schema. The result is a table with columns key, value and type and in case of an array the key column holds the 0-based index of each item in the array. The idea is to use this index for the ORDER BY clause in the ROW_NUMBER() call.

Table:

SELECT ColumnStrings
INTO Data
FROM (VALUES 
   ('AB=ikkw0116,AC=BE D Work stations,AC=BE D stations,AC=D Allocated,AD=pnser,AD=pnsas,AD=owned,AD=increased'),
   ('AB=ikkWA001S1,AC=BE D HD,AC=D Allocated,AD=pnser,AD=pnsas,AD=owned,AD=increased'),
   ('AB=iksw0084,AC=Domain View,AD=pnsas,AD=owned,AD=increased'),
   ('AB=GHRS05900263,AC=Big stations,AC=GHR,AC=BE,AD=ger,AD=eu,AD=intra')
) v (ColumnStrings)

Statement:

SELECT j.StringValue
FROM Data d
OUTER APPLY (
   SELECT 
      j1.[value], 
      JSON_VALUE([value], '$[0]') AS StringKey, 
      JSON_VALUE([value], '$[1]') AS StringValue,
      ROW_NUMBER() OVER (
         PARTITION BY JSON_VALUE([value], '$[0]') 
         ORDER BY CONVERT(int, [key]) DESC
      ) AS RN
   FROM OPENJSON(CONCAT('[["', REPLACE(REPLACE(d.ColumnStrings, ',', '"],["'), '=', '","'), '"]]')) j1
) j
WHERE j.StringKey = 'AC' AND j.RN = 1

Result:

StringValue
-----------
D Allocated
D Allocated
Domain View
BE

回答2:

Please don't even think about trying to do this operation on your production database. Rather, as the comments above suggest, normalize your AD data before bringing it into SQL Server. In particular, SQL Server has poor/no regex support, which is what you really would need here. Towards that end, here is a regex pattern you may use to extract the final value for the key AC:

^.*\bAC=([^,]+)

Demo

You may apply this regex to your data, then maybe reimport.

来源：https://stackoverflow.com/questions/65734282/sql-server-how-to-find-the-substring-just-after-the-last-occurence-of-another-s

标签

sql-server

request

substring