SQL - similar data in column

前端 未结 3 788
野的像风
野的像风 2021-01-20 15:52

Is there any way to find similar results in column. Example:

I want query return from table data without 4 green tree because there is no similar data to g

3条回答
  •  谎友^
    谎友^ (楼主)
    2021-01-20 16:06

    You could use SOUNDEX to do this.

    Sample data;

    CREATE TABLE #SampleData (Column1 int, Column2 varchar(10))
    INSERT INTO #SampleData (Column1, Column2)
    VALUES
    (1,'blue car')
    ,(2,'red doll')
    ,(3,'blue cars')
    ,(4,'green tree')
    ,(5,'red dolly')
    

    The following code will use soundex to create a list of similar entries in column2. It then uses a different sub query to see how many occurrences of that soundex field appear;

    SELECT
    a.GroupingField
    ,a.Title
    ,b.SimilarFields
    FROM (
            SELECT
            SOUNDEX(Column2) GroupingField
            ,MAX(Column2) Title --Just return a unique title for this soundex group
            FROM #SampleData
            GROUP BY SOUNDEX(Column2)
          ) a
    LEFT JOIN   (
                    SELECT
                    SOUNDEX(Column2) GroupingField
                    ,COUNT(Column2) SimilarFields --How many fields are in the soundex group?
                    FROM #SampleData
                    GROUP BY SOUNDEX(Column2)
                ) b
    ON a.GroupingField = b.GroupingField
    WHERE b.SimilarFields > 1
    

    The results look like this (I've left the soundex field in to show you what it looks like);

    GroupingField   Title       SimilarFields
    B400            blue cars   2
    R300            red dolly   2
    

    Some further reading on soundex https://msdn.microsoft.com/en-gb/library/ms187384.aspx

    Edit: as per your request, to get the original data you may as well push into a temp table, change the query i've given you to put an INTO before the FROM statement;

    SELECT
    a.GroupingField
    ,a.Title
    ,b.SimilarFields
    INTO #Duplicates
    FROM (
            SELECT
            SOUNDEX(Column2) GroupingField
            ,MAX(Column2) Title --Just return a unique title for this soundex group
            FROM #SampleData
            GROUP BY SOUNDEX(Column2)
          ) a
    LEFT JOIN   (
                    SELECT
                    SOUNDEX(Column2) GroupingField
                    ,COUNT(Column2) SimilarFields --How many fields are in the soundex group?
                    FROM #SampleData
                    GROUP BY SOUNDEX(Column2)
                ) b
    ON a.GroupingField = b.GroupingField
    WHERE b.SimilarFields > 1
    

    Then use the following query to link back to your original data;

    SELECT
    a.GroupingField
    ,a.Title
    ,a.SimilarFields
    ,b.Column1
    ,b.Column2
    FROM #Duplicates a
    JOIN #SampleData b
    ON a.GroupingField = SOUNDEX(b.Column2)
    ORDER BY a.GroupingField
    

    Would give the following result;

    GroupingField   Title       SimilarFields   Column1     Column2
    B400            blue cars   2               1           blue car
    B400            blue cars   2               3           blue cars
    R300            red dolly   2               5           red dolly
    R300            red dolly   2               2           red doll
    

    Remember to

    DROP TABLE #Differences
    

提交回复
热议问题