Fog Creek Software
Discussion Board




Small SQL challenge

This is a very little, but interesting problem I've once came across. It has a lot of poor obvious solutions, and an elegant, but very non-straightforward one. For those of you doing hiring, this may make a good addition to your interview questions arsenal.

So, you have a table with two columns: "from" and "to". Let's say for clarity that it represents intercity roads. Because of the way the data was produced, for evey record, there's always a "mirror" record, which has "from" and "to" values vice versa.  That is, if there's a
[From: Washington, To: Paris] record, there's guranteed to be [From: Paris, To: Washington]. There is also some amount of records in which both fields are identical [From: Paris, To: Paris]. These records are unique.

The task is to write a query to remove all "duplicate" records. That is, from [Paris, Washington], and [Washington, Paris] we only want one to remain,  doesn't matter which. [Paris, Paris] must, of course, just remain, since there're no other records representing the same information.

The point is I'm just  wondering how quickly people on this board can figure out the pretty way.

Egor
Tuesday, April 20, 2004

About 5 seconds: "delete from thetable where (from > to)" (with a suitable operator that does an alphabetic comparison of the two fields).

Christopher Wells
Tuesday, April 20, 2004

About 5 seconds?  WOW!  Hey, everybody..."super programmer" is in the hiz-ouse!

Color Me Impressed - Not!
Tuesday, April 20, 2004

Not a very good answer but...

select * from table a
where a.primarykey in (
    select a.a.primarykey
    from table a, table b where a.from = b.to
    and b.from = a.to
    minus
    select a.p from table a, table b
    where a.from = b.to
);

SC
Tuesday, April 20, 2004

Christopher, you rock. The least I've gotten was about 30 minutes!

Egor
Tuesday, April 20, 2004

Oops, should be...

select * from table a
where a.primarykey in (
    select a.primarykey
    from table a, table b where a.from = b.to
    and b.from = a.to
    minus
    select a.primarykey from table a, table b
    where a.from = b.to
);

SC
Tuesday, April 20, 2004

In other words:

isn't "with a suitable operator that does an alphabetic comparison of the two fields"

the important part?

Color Me Impressed - Not!
Tuesday, April 20, 2004

BTW, is there a DBMS that doesn't do alphabetical comparison on CHAR type with the > operator? What does SQL standard say?

To say the truth, in my case the data was lower case (not cities, domain-specific stuff).

Egor
Tuesday, April 20, 2004

DELETE FROM routes r1
WHERE EXISTS(SELECT NULL FROM Routes r2
WHERE r1.from = r2.to AND r1.to = r2.from AND r2.from <> r2.to)

I agree that Christopher's solution is shorter and neater, but I have learned not to trust any assumptions people put forward about their data :-). My statement will handle the case where there are rows for which the reverse doesn't exist. And the question with Christopher's solution has a high aha-factor, which makes it an unsuitable interview question.

x
Wednesday, April 21, 2004

Seems I am not the only one falling for the old "do my homework for me please".

Just me (Sir to you)
Wednesday, April 21, 2004

<<
DELETE FROM routes r1
WHERE EXISTS(SELECT NULL FROM Routes r2
WHERE r1.from = r2.to AND r1.to = r2.from AND r2.from <> r2.to)

I agree that Christopher's solution is shorter and neater, but I have learned not to trust any assumptions people put forward about their data :-). My statement will handle the case where there are rows for which the reverse doesn't exist. And the question with Christopher's solution has a high aha-factor, which makes it an unsuitable interview question. >>

With all my respect  to the author - it is not a solution, it removes the rows with the reverse wich should be in the table.

The other solution provided above used "MINUS" which is useful with ORacle , bit does not work with SQL SERVER , SYBASE .

LI
Wednesday, April 21, 2004

*  Recent Topics

*  Fog Creek Home