Eliminating duplicate values based on only one column of the table

前端 未结 3 958
野的像风
野的像风 2020-11-28 07:06

My query:

SELECT sites.siteName, sites.siteIP, history.date
FROM sites INNER JOIN
     history ON sites.siteName = history.siteName
ORDER BY siteName,date


        
相关标签:
3条回答
  • 2020-11-28 07:28

    I solve such queries using this pattern:

    SELECT *
    FROM t
    WHERE t.field=(
      SELECT MAX(t.field) 
      FROM t AS t0 
      WHERE t.group_column1=t0.group_column1
        AND t.group_column2=t0.group_column2 ...)
    

    That is it will select records where the value of a field is at its max value. To apply it to your query I used the common table expression so that I don't have to repeat the JOIN twice:

    WITH site_history AS (
      SELECT sites.siteName, sites.siteIP, history.date
      FROM sites
      JOIN history USING (siteName)
    )
    SELECT *
    FROM site_history h
    WHERE date=(
      SELECT MAX(date) 
      FROM site_history h0 
      WHERE h.siteName=h0.siteName)
    ORDER BY siteName
    

    It's important to note that it works only if the field we're calculating the maximum for is unique. In your example the date field should be unique for each siteName, that is if the IP can't be changed multiple times per millisecond. In my experience this is commonly the case otherwise you don't know which record is the newest anyway. If the history table has an unique index for (site, date), this query is also very fast, index range scan on the history table scanning just the first item can be used.

    0 讨论(0)
  • 2020-11-28 07:36

    This is where the window function row_number() comes in handy:

    SELECT s.siteName, s.siteIP, h.date
    FROM sites s INNER JOIN
         (select h.*, row_number() over (partition by siteName order by date desc) as seqnum
          from history h
         ) h
        ON s.siteName = h.siteName and seqnum = 1
    ORDER BY s.siteName, h.date
    
    0 讨论(0)
  • 2020-11-28 07:43

    From your example it seems reasonable to assume that the siteIP column is determined by the siteName column (that is, each site has only one siteIP). If this is indeed the case, then there is a simple solution using group by:

    select
      sites.siteName,
      sites.siteIP,
      max(history.date)
    from sites
    inner join history on
      sites.siteName=history.siteName
    group by
      sites.siteName,
      sites.siteIP
    order by
      sites.siteName;
    

    However, if my assumption is not correct (that is, it is possible for a site to have multiple siteIP), then it is not clear from you question which siteIP you want the query to return in the second column. If just any siteIP, then the following query will do:

    select
      sites.siteName,
      min(sites.siteIP),
      max(history.date)
    from sites
    inner join history on
      sites.siteName=history.siteName
    group by
      sites.siteName
    order by
      sites.siteName;
    
    0 讨论(0)
提交回复
热议问题