Fog Creek Software
Discussion Board




Welcome! and rules

Joel on Software

Database query question

this is sql question, not a .net question, but I find I get good answers from users on this board, so here is goes.

If I have pseudo tabes PS_A and PS_B defined by queries, and I want all results from PS_A that are NOT in PS_B, what is the most efficient way?

SELECT xxx
  FROM (SELECT xxx FROM table_a WHERE xxxxx ) PS_A,
            (SELECT id FROM table_b WHERE xxxxx ) PS_B
WHERE PS_A.id NOT IN PS_B

----  OR -----

SELECT xxx
  FROM table_a
WHERE table_a.id NOT IN
              (SELECT id FROM table_b WHERE xxxxx )

----  OR -----

Is there some other way that is better?

Thanks for the help.

Roger
Monday, August 23, 2004

All records from PS_A are not in PS_B, by default.

Sorry, couldn't resists
Monday, August 23, 2004

The usual answer to these kind of questions is "profile it and see".  If you are using derived tables and some kind of NOT IN construction, I'd say this goes double.

I did not see a frustrated left join in your examples.  Something that should be on your list.

Also, I will say that I have been really impressed with the performance of correlated subqueries on MS SQL server.  For a long time I thought these were always performance killers but it is not so.  Particularly when you can use "exists", correlated subqueries can be the fastest approach.

I would at least try something like:

(derived tables condensed)

select a.x
from a
where not exists
  (select * from b where b.id = a.id)

and see how that actually pans out compared to the other options.

Matt Conrad
Monday, August 23, 2004

Agreed with Matt, EXISTS is the simplest way.  I would also phrase the IN query as:

SELECT xxx
  FROM table_a
WHERE table_a.id NOT IN
              (SELECT id FROM table_b)

This should perform better than a correlated version and return the same results.

Finally, the LEFT JOIN version mentioned would be:

SELECT xxx
  FROM table_a
  LEFT JOIN table_b ON table_a.id = table_b.id
WHERE table_b.id IS NULL


As for performance, it depends on indexes (in addition to other things).  The LEFT JOIN or non-correlated IN might perform better with clustered primary keys as a merge join will be possible.  The correlated EXISTS might perform better against a narrow non-clustered index on table_b (e.g. just the ID column indexed), as this will enable less disk IO for the seek.

Finally, it also depends on your DBMS and whether IN or EXISTS have been specifically optimized.

A.M.
Monday, August 30, 2004

Matt Conrad
Monday, August 23, 2004


This seems to be the most precise query.

Suvadip Paul
Thursday, October 07, 2004

*  Recent Topics

*  Fog Creek Home