Raw Sql statement to group by column with different strings for the same name

柔情痞子 提交于 2020-01-25 00:19:05

问题


Fairly new to creating a more complex sql statement, I'm trying to do a group by a name where the name can come in different forms. for example, name can be "Kane, Patrick", "P.Kane, Patrick", "Kane, Patrick*

what i have so far below which queries around 7000 results:

SELECT 
SUM(games_played) as games_played,
SUM(goals) as goals,
SUM(points) as points,
player_name
FROM player_stats
GROUP BY player_name;

example resulting json

[
{games_played: 123, goals: 12, points: 40, player_name: "Kane, Patrick"},
{games_played: 123, goals: 12, points: 40, player_name: "P. Kane, Patrick"},
{games_played: 123, goals: 12, points: 40, player_name: "Kane, Patrick*"},
{games_played: 123, goals: 12, points: 40, player_name: "Nylander, Alex"},
{games_played: 123, goals: 12, points: 40, player_name: "A. Nylander, Alex"},
{games_played: 123, goals: 12, points: 40, player_name: "Nylander, Alex*"},
{games_played: 123, goals: 12, points: 40, player_name: "Lemieux, Mario"},
{games_played: 123, goals: 12, points: 40, player_name: "Gretzky, Wayne"},
]

question is how to get sums of each column grouped by like players so the result would look more like below:

[
{games_played: 369, goals: 36, points: 120, player_name: "Kane, Patrick"},
{games_played: 369, goals: 36, points: 120, player_name: "Nylander, Alex"},
{games_played: 123, goals: 12, points: 40, player_name: "Lemieux, Mario"},
{games_played: 123, goals: 12, points: 40, player_name: "Gretzky, Wayne"},
]

even better if i can get a knex.js query but i have no problem using a raw query here. DB is postgresSQL. 

thanks in advance

回答1:


If you have to do it, you can try this:

SELECT 
SUM(games_played) as games_played,
SUM(goals) as goals,
SUM(points) as points,
player_name
FROM player_stats
GROUP BY
 CASE 
      when player_name like '%Patr%' then 'Kane, Patrick'
      when player_name like '%Alex%' then 'Nylander, Alex'
      when player_name like '%Mar%' then 'Lemieux, Mario'
      when player_name like '%Wayn%' then 'Gretzky, Wayne'
 ELSE NULL
 END

But you should take Caius Jard s advice...




回答2:


You'll need to do something to transform the names to a consistent form, be it string replacement, splitting on period and taking only the second Value, removing special chars etc. There isn't anything artificially intelligent that can go "oh; p kane, Patrick is clearly the same as PatrickKane*"- you'll have to do the manipulations yourself. You could even have a table with two columns with all the variations of each name, mapped to a consistent name, then do the join on the varied name and group on the consistent one

I think my first step would be to sort out the data:

UPDATE player_stats 
SET player_name = REPLACE(player_name, '*', '')

UPDATE player_stats 
SET player_name = SUBSTRING(player_name from 3)
WHERE player_name LIKE '_.%'

You could stop here and just keep re running this forever more to keep removing the garbage that arrives in the table, adding more rules as more variations arrive

But you should then make a new table for the players:

SELECT uuid_generate_v4() as player_id, player_name 
INTO players
FROM (SELECT distinct player_name FROM player_stats)x

ALTER TABLE players ADD PRIMARY KEY (player_id);

Then add a column to stats to take the id:

ALTER TABLE player_stats ADD player_id UUID;

Copy the data in:

UPDATE player_stats d
SET d.player_id = s.player_id
FROM players s
WHERE s.player_name = d.player_name

Set the foreign key up:

ALTER TABLE player_stats
ADD CONSTRAINT fk_playersstats_playerid__players_playerid FOREIGN KEY player_id REFERENCES players(player_id)

Finally dump the name column:

ALTER TABLE player_stats DROP player_name

And then go fix the program that filled the table with varying garbage in the first place :)



来源:https://stackoverflow.com/questions/58774641/raw-sql-statement-to-group-by-column-with-different-strings-for-the-same-name

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!