Group by similar string

烂漫一生 提交于 2019-12-11 01:58:19

问题


Suppose I have a table like this

| id_grupo |    nombre       |
|:---------|----------------:|
| 1        | Emprendedores 1 |     
| 2        | Emprendedores 2 |    
| 3        | Emprendedoras 1 |      
| 4        | Emprendedoras 2 |         
| 5        | Los amigos 1    |       
| 6        | Los amigos 2    |
| 7        | Los amigos no 1 |  

I want to group by name that are equal but ends in different number. If you look closely there are names which consists of two or more words however the difference is the ending. Also there are name which look similar but they are not the same like "Los amigos" and "Los amigos no", these belong to different groups, also "Emprendedores" and "Emprendedoras" are different.

This is the query I have:

SELECT *, GROUP_CONCAT(id_grupo) 
FROM creabien_sacredi_dev.grupos
GROUP BY SUBSTRING(nombre,1,5)

It works fine with most of the records however the problem comes with strings like in the example which are very similar. I choose a substring with 5 characters but in fact names doesn't have the same length so some strings are not working as expected.

How can I group these strings in the following form?

    | id_grupo |    nombre       | GROUP_CONCAT(id_grupo) |
    |:---------|----------------:|-----------------------:|
    | 1        | Emprendedores 1 |  1,2                   |    
    | 3        | Emprendedoras 1 |  3,4                   |   
    | 5        | Los amigos 1    |  5,6                   |
    | 7        | Los amigos no 1 |  7                     |

I think the key is on the last number the preceding string must be exactly the same, but I don't know how to do it. Could you help me please?

Edit:

There are also records like 'Emprendedores' without any number at the end and this also should be grouped with 'Emprendedores 1' and 'Emprendedores 2'. So I think the number isn't anymore the key, in fact I doubt if there exist a way to group these records.


回答1:


How about the following:

SELECT CASE 
         WHEN RIGHT(nombre, 1) BETWEEN '0' AND '9' THEN 
         LEFT(nombre, Length(nombre) - 2) 
         ELSE nombre 
       END AS nombrechecked, 
       Group_concat(id_grupo) 
FROM   grupos 
GROUP  BY 1 

Here is the SQL Fiddle that shows it works.




回答2:


If items to cut are numbers only and they are always separated by a space:

 SELECT CASE nombre REGEXP '[0-9]$'
        WHEN 1 THEN REVERSE (SUBSTR(REVERSE(nombre), 
                              INSTR(REVERSE(nombre),' '))) ELSE nombre END grupo, 
        GROUP_CONCAT(id_grupo)    
   FROM grupos
  GROUP BY grupo;

Just a proposal ... :) probably not the most performant. Advantage is, that it works with larger numbers at the end as well.

Check out this Fiddle.




回答3:


If you don't have any entries like 'Los 2 amigos 1', then

SELECT *, GROUP_CONCAT(id_grupo) 
FROM creabien_sacredi_dev.grupos
GROUP BY --replace digits with spaces
REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE(
nombre,
'0',' '),'1',' '),'2',' '),'3',' '),'4',' '),'5',' '),'6',' '),'7',' '),'8',' '),'9',' ')


来源:https://stackoverflow.com/questions/20222990/group-by-similar-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!