问题
I have the following tables:
CREATE TABLE Company (
CompanyUniqueID BIGSERIAL PRIMARY KEY NOT NULL,
Name VARCHAR (150) NOT NULL
);
CREATE TABLE Item (
ItemUniqueID BIGSERIAL PRIMARY KEY NOT NULL,
CompanyUniqueID BIGINT NULL REFERENCES company DEFERRABLE INITIALLY DEFERRED,
Name VARCHAR (150) NOT NULL,
AddedDate TIMESTAMP without time zone DEFAULT now()
);
In the life time of the application new companies and items are added to the tables. I wish to create an sql query that will select the "new added companies" from a given date I've started with this query:
(Select * from company
where companyuniqueid in (
select distinct companyuniqueid from Item where AddedDate > '2014-10-25'))
The above is not good because items that were added after 2014-10-25 and belong to companies that already exist will be also selected.
For example, a snapshot of Company
table from 2014-10-20 can look like this:
1 AAA
2 BBB
3 CCC
and table Items will look like:
1 1 111 2014-10-01
2 2 222 2014-10-10
3 2 333 2014-10-10
4 3 444 2014-10-15
on the 2014-10-26 the following records were added:
table company
4 DDD
table Items
5 1 555 2014-10-26
6 3 663 2014-10-26
7 4 777 2014-10-27
I've tried adding this to the query:
(Select * from company
where companyuniqueid in (
select distinct companyuniqueid from Item
where AddedDate > '2014-10-25')
and companyuniqueid not in (
select distinct companyuniqueid from Item
where AddedDate <= '2014-10-25'))
but I'm getting an empty result, what should be the query in order to receive only 4 DDD?
回答1:
Use an EXISTS anti-semi-join. Typically fastest and cleaner than NOT IN
:
SELECT *
FROM company c
WHERE NOT EXISTS (
SELECT 1
FROM item
WHERE addeddate < '2014-10-25'
AND companyuniqueid = c.companyuniqueid);
This returns
companies that have no items pre-dating a given date.
including companies without items and possibly with items that have addeddate IS NULL
.
To limit result to companies with new items, add:
WHERE EXISTS (
SELECT 1
FROM item
WHERE addeddate >= '2014-10-25'
AND companyuniqueid = c.companyuniqueid)
But consider adding another column added_date
to the table company
to avoid ambiguities and simplify things.
Why did the original query not work?
Probably another case of NOT IN
with a set containing a NULL
value.
Your column item.companyuniqueid
allows NULL
values. Your subquery:
select distinct companyuniqueid from Item where AddedDate <= '2014-10-25'
... probably includes a NULL
value. In that case, this expression is never TRUE
:
companyuniqueid not in (<above subquery>)
Returns FALSE
or NULL
("unknown") if the set includes a NULL
value. But only TRUE
would qualify as WHERE
condition. So no row is returned.
Note that the same is not true for an empty set. If above subquery would return no row, the NOT IN
expression would evaluate to TRUE
, as long as the left side is NOT NULL
.
Basically, avoid NOT IN (<subquery>)
where you can. NOT EXISTS
is almost always superior.
If you use it, know your way around NULL
values. More details:
- Find records where join doesn't exist
- Select rows which are not present in other table
回答2:
with min_added as
(
select i.companyuniqueid, min(i.addeddate) as addeddate
from item i
group by companyuniqueid
)
select * from min_added where min_added.addeddate > '2014-10-26';
will give you the company ids for all companies with items added after the date specified (no company will be returned for any companies with items added before the date.)
来源:https://stackoverflow.com/questions/26598764/sql-query-to-extract-new-records