Find and Remove Repeated Substrings

前端 未结 2 736
情歌与酒
情歌与酒 2020-12-21 02:15

I\'ve a column in a SQL Server 2008 table where part of the string was accidentally repeated.

Does anyone have a quick and easy way to remove the trailing duplicated

2条回答
  •  死守一世寂寞
    2020-12-21 02:46

    If you don't already have a numbers table:

    SET NOCOUNT ON;
    DECLARE @UpperLimit INT;
    SET @UpperLimit = 4000;
    
    WITH n(rn) AS
    (
        SELECT ROW_NUMBER() OVER (ORDER BY [object_id])
        FROM sys.all_columns
    )
    SELECT [Number] = rn - 1
    INTO dbo.Numbers FROM n
    WHERE rn <= @UpperLimit + 1;
    
    CREATE UNIQUE CLUSTERED INDEX n ON dbo.Numbers([Number]);
    

    Now a generic split function, that will turn your delimited string into a set:

    CREATE FUNCTION dbo.SplitString
    (
        @List NVARCHAR(MAX),
        @Delim CHAR(1)
    )
    RETURNS TABLE
    AS
        RETURN ( SELECT 
            rn, 
            vn = ROW_NUMBER() OVER (PARTITION BY [Value] ORDER BY rn), 
            [Value]
          FROM 
          ( 
            SELECT 
              rn = ROW_NUMBER() OVER (ORDER BY CHARINDEX(@Delim, @List + @Delim)),
              [Value] = LTRIM(RTRIM(SUBSTRING(@List, [Number],
              CHARINDEX(@Delim, @List + @Delim, [Number]) - [Number])))
            FROM dbo.Numbers
            WHERE Number <= LEN(@List)
            AND SUBSTRING(@Delim + @List, [Number], 1) = @Delim
          ) AS x
        );
    GO
    

    And then a function that puts them back together:

    CREATE FUNCTION dbo.DedupeString
    (
        @List NVARCHAR(MAX)
    )
    RETURNS NVARCHAR(MAX)
    AS
    BEGIN
        RETURN ( SELECT newval = STUFF((
         SELECT '\' + x.[Value] FROM dbo.SplitString(@List, '\') AS x
          WHERE (x.vn = 1)
          ORDER BY x.rn
          FOR XML PATH, TYPE).value('.', 'nvarchar(max)'), 1, 1, '')
        );
    END
    GO
    

    Sample usage:

    SELECT dbo.DedupeString('alpha\bravo\bravo\charlie\delta\bravo\charlie\delta');
    

    Results:

    alpha\bravo\charlie\delta
    

    You can also say something like:

    UPDATE dbo.MessedUpTable
      SET OopsColumn = dbo.DedupeString(OopsColumn);
    

    @MikaelEriksson will probably swoop in with a more efficient way to use XML to eliminate duplicates, but that is what I can offer until then. :-)

提交回复
热议问题