Fog Creek Software
Discussion Board




challenging SQL question

i have a very specific technical problem that i cannot solve. i've posted it on several supposed sql expert sites and gotten nowhere. i know this may not be the best place to post it, but i've seen far more inappropriate posts here. so if you think you're a sql pro, please help. 

i have 3 tables: players, teams, and team_members. TABLE players has a pool of all available players. TABLE teams has all teams. a group of teams comprises one league. TABLE team_members reflects which players are on which team. so if player joe was on team jets, there would be an entry in team_members referencing the jets and joe. there is a restriction that no player appear on more than one team in any given league. that is, only the jets can have joe in their league. but in some another league, a different team can have joe.

i want a query that selects all the players NOT on any team in a given league. the catch is that it cannot use nested queries (also called subqueries). if you think you can help, please email me or post a solution. if you think this post is inappropriate, i apologize.

here is the database dump with some sample data. as an example, try to select all players not in 'league1'. that is, { ryan, pat, joe, peppi, amos, chris, adam, kenny, paul }. remember, no nested queries.

CREATE TABLE team_members (
  team_member_id int(10) unsigned NOT NULL auto_increment,
  team_id int(10) unsigned NOT NULL default '0',
  player_id mediumint(8) NOT NULL default '0',
  PRIMARY KEY  (team_member_id)
)

INSERT INTO team_members VALUES (730,515,4590);
INSERT INTO team_members VALUES (729,514,302154);
INSERT INTO team_members VALUES (728,513,396138);
INSERT INTO team_members VALUES (727,512,396139);
INSERT INTO team_members VALUES (726,511,4590);

CREATE TABLE players (
  player_id mediumint(8) NOT NULL default '0',
  first_name varchar(50) default NULL,
  PRIMARY KEY  (player_id)
)

INSERT INTO players VALUES (133460,'Ryan');
INSERT INTO players VALUES (396138,'Walter');
INSERT INTO players VALUES (396139,'Dave');
INSERT INTO players VALUES (302154,'Pat');
INSERT INTO players VALUES (146369,'Joe');
INSERT INTO players VALUES (133461,'Peppi');
INSERT INTO players VALUES (133462,'Amos');
INSERT INTO players VALUES (4590,'Jeff');
INSERT INTO players VALUES (192939,'Chris');
INSERT INTO players VALUES (397803,'Adam');
INSERT INTO players VALUES (484894,'Kenny');
INSERT INTO players VALUES (235186,'Paul');

CREATE TABLE teams (
  team_id int(10) unsigned NOT NULL auto_increment,
  team_name varchar(100) NOT NULL default '',
  league_id varchar(45) NOT NULL default '0',
  PRIMARY KEY  (team_id)
)

INSERT INTO teams VALUES (511,'Jets','league1');
INSERT INTO teams VALUES (512,'Browns','league1');
INSERT INTO teams VALUES (513,'Falcons','league1');
INSERT INTO teams VALUES (514,'Patriots','league2');
INSERT INTO teams VALUES (515,'Bengals','league3');

o.s.
Tuesday, June 08, 2004

[sniff, sniff]

Smells like homework.

Philo

Philo
Tuesday, June 08, 2004

Have you tried an outer join?

.
Tuesday, June 08, 2004

no, philo, i finished school a year ago and im now a software engineer; this isn't a trivial problem.

this is just something im doing on my own. if you think my database setup precludes the type of query i am looking for, please let me know of a better setup.

o.s.
Tuesday, June 08, 2004

Isn't it just Pat and Jeff that aren't in league1?

SELECT
    p.*
FROM
    players p
    INNER JOIN team_members tm ON p.player_id = tm.player_id
    INNER JOIN teams t ON tm.team_id = t.team_id
WHERE
    t.league_id <> 'league1'

Ben R
Tuesday, June 08, 2004

why in the world should subqueries be forbidden?

what sort of crap DBMS are you using?

muppet from madebymonkeys.net
Tuesday, June 08, 2004

to  Ben R : not quite....those are some players in other leagues. TABLE players contains all available players. think of it  as a pool from which everyone picks. you'll notice there are many. then there are 3 leagues. each can pick from the pool. no 2 teams in the same league can have the same player. if you've ever player fantasy baseball, it's like that.

to muppet from madebymonkeys.net : i am using mysql 4.0.18. nested queries are only in 4.1 which is not yet in wide use.

to the person who posted under ".": yes the solution will involve a series of joins. figuring out which and in what order is the hard part.

o.s.
Tuesday, June 08, 2004

I'll agree with Philo that this smells incredibly like homework. Even if you could used subqueries, there are other manual methods of achieving your results.

In any case, in the idiom of SQL Server the following works:

SELECT
  league_id, first_name
FROM
  teams
  CROSS JOIN players
  LEFT JOIN team_members ON teams.team_id = team_members.team_id AND players.player_id = team_members.player_id 
GROUP BY
  league_id, first_name
HAVING
  league_id = 'league1'
  AND SUM(CASE WHEN team_members.team_id IS NULL THEN 0 ELSE 1 END) = 0

(I was curious and thought I'd take up the challenge. I have no idea what other rudimentary features of a RDBMS that MySQL is missing, so this probably is of no help)

Dennis Forbes
Tuesday, June 08, 2004

"Even if you could used subqueries" = "Even if you couldn't have used subqueries"

...back to your regularly scheduled broadcast

Dennis Forbes
Tuesday, June 08, 2004

This should work:

SELECT
  p2.first_name
FROM
  players p1
  JOIN team_members m ON p1.player_id = m.player_id
  JOIN teams t ON m.team_id = t.team_id AND t.league_id = 'league1'
  RIGHT OUTER JOIN players p2 ON p1.player_id = p2.player_id
WHERE
  p1.player_id IS NULL

This works by getting all the players in league 1 (the inner joins), and then relating that to *all* the players (the right join), so you can see which players don't appear in the first set (the null comparison).

Substitute 'league1' with a variable and you should be set.

Cheers,
Rhys

Rhys Weekley
Tuesday, June 08, 2004

I find it really hard to summon the effort to type up a reply with technical answers and code when the original poster isn't willing to put in enough effort to press the shift key. Dude, punctuation and proper capitalization is just as important in normal writing as it is in writing code, such as your SQL statement examples.

JWA
Tuesday, June 08, 2004

Dennis Forbes, shockingly that worked alright in mysql on the sample data. but on a larger data set my cpu went 100% and never quite recovered. i'll have to try and optimize it a bit.

Rhys Weekley, that didn't work on mysql--i got the empty set.

thanks to all those who are helping out.

o.s.
Tuesday, June 08, 2004

The logic is correct, but the alias and/or join syntax may need adjusting. Try fully qualifying the table names for starters, and if that doesn't work, try using your regular syntax for the joins - replace JOIN with whatever you use for an inner join, and replace RIGHT OUTER JOIN with whatever you use for those.

Rhys Weekley
Tuesday, June 08, 2004

If the server is under your control, switch to PostgreSQL. If the server is not under your control, persuade whoever's in charge to switch to PostgreSQL. If it's a service provider and they refuse to provide PostgreSQL as an option, switch to another service provider.

TJ Haeser
Wednesday, June 09, 2004

I'm not sure if you qualify for the title of software engineer yet (job titles don't mean anything)

Credit where it's due though, you do know how to ask the questions. 

Ged Byrne
Wednesday, June 09, 2004

o.s. my solution is an absolute dog of a solution, and if you had 1000 players and 100 teams, it would yield a 100,000 row crossjoin table. Obviously with the features in any modern database system I would do it significantly different.

Having said that, Rhys solution is right on the money and works perfectly (it must be syntatical nuances if mySQL is puking on it), not to mention being dramatically more efficient.

Dennis Forbes
Wednesday, June 09, 2004

Yeah MySQL lacks decent subqueries until a pretty recent version. It sucks having to work with it. Why on earth can't all the linux web hosting companies install Postgresql instead? urgh.

But anyway.

I can't be arsed figuring out the particulars of your database schema, but there's a quick sneaky way of finding all records in a given table, for which there /doesn't/ exist a corresponding record in a second table, without using subqueries. You do a left join and then test for null in the second table, like so:

SELECT
  main_table.*
FROM
  main_table
    LEFT JOIN
  second_table
    ON main_table.foreignkey = second_table.primarykey
WHERE
  second_table.primarykey IS NULL

You should be able to adapt that to solve your problem. It's a bit ugly (it's essentially saying 'find all records in maintable, with their corresponding record in secondtable if it exists and null otherwise, and then single out the records for which the secondtable data is null') but it works.

Goes without saying but build indexes on all the relevant foreign key fields if it runs slowly.

Matt
Wednesday, June 09, 2004

I'll add to the calls to move to PostgreSQL - if you plan to write any kind of major application working without subqueries is like working with a hand tied behind your back - if you're working for $$$ you can't afford to lose that capability; if it's a hobby there's no reason you shouldn't have better tools.

Sorry about the homework thing - since you didn't specify MySQL then I assumed we were talking a major RDBMS, so the "no subqueries" sounded like an artificial restriction (the kind you see on homework). :-)

Philo

Philo
Wednesday, June 09, 2004

What about "minus" operator?
I do not know if MySQL can do it but this would probably be simplest:

select * from players
minus
select (those that play in selected league) from players

moronica
Wednesday, June 09, 2004

Sorry, I gotta ask.

o.s., you clearly have a working shift key (based on the SQL samples). So what is the reasoning behind ignoring the "capitalise the first letter of sentences" and "capitalise the personal pronoun" rules? Sure, when a poet does it, it's cool. But when a software developer does it...

Bill Tomlinson
Wednesday, June 09, 2004

I'm surprised you didn't get any answer on those "sql expert sites"...

Joe
Wednesday, June 09, 2004

Thanks to all those that responded. I really appreciate your help.

It turns out, as I suspected all along, that the reason I was struggling so much with this is that mysql is doing something funny.

I replicated my setup on db2 and my first guess (which did NOT work on mysql) worked flawlessly on db2; also, all suggestions on this thread worked flawlessly too:

SELECT
    p.*
FROM players p
        LEFT JOIN
            team_members m JOIN teams t
            ON t.league_id = 'league1'
                AND t.team_id = m.team_id
        ON p.player_id = m.player_id
WHERE m.team_id IS NULL;

I guess will submit a bug to mysql; I feel like maybe I'm doing something wrong with mysql because I recall a slashdot article on mysql being a great example of bug free open source software.

Anyway, thanks again to everyone who helped. And apologies for the capitalization--I had no idea people cared.

PS If there's any mysql experts out there that have insight as to why that join won't work, please let me know.

o.s.
Wednesday, June 09, 2004

Ok so I got an official reply from mysql within an hour about my bug:

"The explanation is that you are performing a nested join, which MySQL does not currently support.  The capability is of course on our TODO and will be implemented, but it is not available today.  You would, basically, need to rewrite the query to avoid the nesting of joins."

I have no idea how to rewrite the query without nested joins and have it perform well.

o.s.
Wednesday, June 09, 2004

No nested joins *and* no subqueries? and this is a 4.x product?  geez...

Joe
Wednesday, June 09, 2004

Hmm, try something like this:

SELECT
  p.*
FROM
  players p
    LEFT JOIN
      team_members m ON p.player_id = m.player_id
        JOIN
          teams t ON t.team_id = m.team_id
WHERE
  t.league_id = 'league1'
    AND
  m.team_id IS NULL;

The t.league_id = 'league1' should definately be in the where clause as it has nothing to do with any joins. Reorganising the order you join different tables sometimes seems to make mysql cooperate... it doesn't do nested joins in the properly although you can nearly always acheive the same thing just by writing the joins one after the other, making sure to specify the keys it joins on (as this doesn't always seem to work with natural joins or 'using'). Dontcha just love mysql? *curses*.

Matt
Wednesday, June 09, 2004

Thanks Dennis. :-)

Rhys Weekley
Wednesday, June 09, 2004

Matt: that didn't work in mysql but the logic seems sound--it'll probably work in a real db.

Anyway, in case people are still reading this thread, here is how I will solve the problem:

To avoid the nested join I'll add another foreign key to team_members called league_id. This way I won't need to join on teams at all.

It's not as elegant as before, but not that bad either.

Then the query is very simple:

SELECT p.* FROM players p
    LEFT JOIN team_members m
    ON m.league_id = 'league1'
        AND m.player_id = p.player_id
    WHERE m.team_member_id IS NULL;

I don't think mysql is that bad with the features it has implemented. But unfortunately it's kind of incomplete.

o.s.
Wednesday, June 09, 2004

Depending on your performance requirements, you might want to consider creating a View joining teams to team_members (to pull team_members.* + teams.league_id), then join the players to the new view in your query (assuming MySQL can actually do that *grin*).  Not quite as performant, but not bad either, and it might be worth it to avoid the double data entry.

Joe
Wednesday, June 09, 2004

I really think a simple variation on what I suggested should work... I'm sure I've done queries like that myself in MySQL and got them to work. Like I said it can be a bit weird about joins but typically you can tweak things until it does what you want...

Matt
Thursday, June 10, 2004

Ah I've just spotted something. Try making the second join into a LEFT JOIN too. Not sure if that'd do quite what you wanted but I have the feeling its more likely to work.

SELECT
  p.*
FROM
  players p
    LEFT JOIN
      team_members m ON p.player_id = m.player_id
        LEFT JOIN
          teams t ON t.team_id = m.team_id
WHERE
  t.league_id = 'league1'
    AND
  m.team_id IS NULL;

If that doesnt work I give up.

Matt
Thursday, June 10, 2004

Matt, I think you do need something like:

SELECT
    players.*
FROM
    players
LEFT JOIN
    (
        teams
    INNER JOIN
        team_members ON
        team_members.team_id = teams.team_id AND
        teams.league_id = 'league1'
    ) ON
    team_members.player_id = players.player_id
WHERE
    teams.team_id IS NULL

Specifying the league outside the join will not work because the player is by definition of the problem, not in a leauge.  You first must join the list of teams and team_members for the specified league, then compare that list to the list of players, selecting only players that aren't in the list.

The query above works perfectly for me in SQL Server and probably most other modern databases.

GuyIncognito
Thursday, June 10, 2004

GuyIncognito explained it perfectly. 

The workaround I mentioned on my previous post works very well once appropriate indeces are created.

Thanks everyone.

o.s.
Thursday, June 10, 2004

Ah yeah I see what you mean. I'd just moved that bit into the where clause without thinking.

But yeah, it'd be fine if nested joins were allowed. I'd probably write it with subqueries as that makes it a bit clearer, but don't think it supports those either yet.

Matt
Friday, June 11, 2004

*  Recent Topics

*  Fog Creek Home