sql query to extract new records

问题

I have the following tables:

CREATE TABLE Company (
    CompanyUniqueID BIGSERIAL PRIMARY KEY NOT NULL,
    Name VARCHAR (150) NOT NULL
 );

CREATE TABLE Item ( 
  ItemUniqueID BIGSERIAL PRIMARY KEY NOT NULL,
  CompanyUniqueID BIGINT NULL REFERENCES company DEFERRABLE INITIALLY DEFERRED,
  Name VARCHAR (150) NOT NULL,
  AddedDate TIMESTAMP without time zone DEFAULT now()
);

In the life time of the application new companies and items are added to the tables. I wish to create an sql query that will select the "new added companies" from a given date I've started with this query:

(Select * from company
 where companyuniqueid in (
   select distinct companyuniqueid from Item where AddedDate > '2014-10-25'))

The above is not good because items that were added after 2014-10-25 and belong to companies that already exist will be also selected.

For example, a snapshot of Company table from 2014-10-20 can look like this:

1 AAA
2 BBB
3 CCC

and table Items will look like:

1 1 111 2014-10-01
2 2 222 2014-10-10
3 2 333 2014-10-10
4 3 444 2014-10-15

on the 2014-10-26 the following records were added:

table company

4 DDD

table Items

5 1 555 2014-10-26
6 3 663 2014-10-26
7 4 777 2014-10-27

I've tried adding this to the query:

(Select * from company
 where companyuniqueid in (
    select distinct companyuniqueid from Item
    where AddedDate > '2014-10-25')
 and companyuniqueid not in (
    select distinct companyuniqueid from Item
    where AddedDate <= '2014-10-25'))

but I'm getting an empty result, what should be the query in order to receive only 4 DDD?

回答1:

Use an EXISTS anti-semi-join. Typically fastest and cleaner than NOT IN:

SELECT *
FROM   company c
WHERE  NOT EXISTS (
   SELECT 1
   FROM   item
   WHERE  addeddate < '2014-10-25'
   AND    companyuniqueid = c.companyuniqueid);

This returns
companies that have no items pre-dating a given date.
including companies without items and possibly with items that have addeddate IS NULL.
To limit result to companies with new items, add:

WHERE  EXISTS (
   SELECT 1
   FROM   item
   WHERE  addeddate >= '2014-10-25'
   AND    companyuniqueid = c.companyuniqueid)

But consider adding another column added_date to the table company to avoid ambiguities and simplify things.

Why did the original query not work?

Probably another case of NOT IN with a set containing a NULL value.
Your column item.companyuniqueid allows NULL values. Your subquery:

select distinct companyuniqueid from Item where AddedDate <= '2014-10-25'

... probably includes a NULL value. In that case, this expression is never TRUE:

companyuniqueid not in (<above subquery>)

Returns FALSE or NULL ("unknown") if the set includes a NULL value. But only TRUE would qualify as WHERE condition. So no row is returned.

Note that the same is not true for an empty set. If above subquery would return no row, the NOT IN expression would evaluate to TRUE, as long as the left side is NOT NULL.

Basically, avoid NOT IN (<subquery>) where you can. NOT EXISTS is almost always superior.
If you use it, know your way around NULL values. More details:

Find records where join doesn't exist
Select rows which are not present in other table

回答2:

with min_added as
(
    select i.companyuniqueid, min(i.addeddate) as addeddate
      from item i
  group by companyuniqueid
)
select * from min_added where min_added.addeddate > '2014-10-26';

will give you the company ids for all companies with items added after the date specified (no company will be returned for any companies with items added before the date.)

来源：https://stackoverflow.com/questions/26598764/sql-query-to-extract-new-records

标签

sql

postgresql

exists