Fog Creek Software
g
Discussion Board




sql query question

I have a table with a field say "cust_num" which is a char field (to accomadate values like 1, 11, 3 and also abc12, xyz456 etc.). I have need to contruct queries with "order by cust_num". Since cust_num is a char field I get the output like
1
11
3
abc12
xyz456

But I would like to get the output like

1
3
11
abc12
xyz456

I want to be able to sort this char field numerically instead of alphabetically. Is this possible?
I have been trying different things like casting etc., but no luck yet. Any thoughts?
Thanks for the help in advance.

Anon
Monday, December 22, 2003

Forgot to mention, I am using SQL Server 2000.

Anon
Monday, December 22, 2003

You might try changing the collation for the column to one with a different sort order such as binary.  This will behave differently based on the locale your machine is set to, the datatype of the field you are sorting on and the collation you choose.

Dave B.
Monday, December 22, 2003

Anon,

You can right justify the expressions within a  string using a function like this:

right(replicate(' ', 12) + CustomerId, 12)

You would want to replace both 12s with an appropriate width for the field.

This will sort the numbers correctly.

You don't have to display the field in this format, just use this expression in the 'order by' clause.

Ged Byrne
Monday, December 22, 2003

Ged,
Thanks, thats a neat trick. But that doesn't exactly solve my problem. It works if I have values like {1, 2, 3, 10, abc12, xyz34}. I get the desired output like,
1
2
3
10
abc12
xyz34.

But if I have values like {1, 2, 3, 10101010, abc12, xyz34} then I get the output like
1
2
3
abc12
xyz34
10101010

instead of
1
2
3
10101010
abc12
xyz34


Dave, I am looking into chaning the collation for the column.

Thanks for the help so far.

Anon
Monday, December 22, 2003

Anon,

101010101 shouldn't cause any problems unless it is bigger than the value your padding to.

Did you leave the value at 12, or did you change it to a value less than 8?

Ged Byrne
Monday, December 22, 2003

I am sorry Ged. I thought I was supposed to pad it with field width. So I used 15. But I guess I didn't understand the solution properly. And now I am totally confused :)
What's the significance of that number there? Will you be kind enough to explain? Thanks.

Anon
Monday, December 22, 2003

Anon,

The field length is probably the best values to choose.  Here is how it all works.

The field in question is a string.  String sorting is done as follows.

Check the first character.  If it is equal, check the next character, keep going to you find one that is different and compare using that.

So, if you have ANT and BAT, ANT comes first based on the first letter, A.

It is more difficult for ANT and ANIMAL.  A and N are equal, so ANIMAL come first because I is lower than T.  The fact that ANIMAL is longer means nothing.  The comparison just moves from left to right until a difference is found.

This is different from numbers, because numbers read from the right.  1 is less than 10.  10 is less than a 100.  The number of digits is significant.

The problem is that in your text field everything is treated as text.  When 10 is compared to 2 it compares the first character, and since they differ the sort is based upon that.  2 is greater than 1 so 10 comes first.

The easiest solution to this is to add preceeding zeros.

02 compares to 10 correctly because 0 comes before 1.  If we have 100, 010, 020 and 001 then it will all sort correctly.  If we add 3 to the list, however, it would all go wrong.  3 comes after 1!

Rather than using zeros, we are using spaces so it doesn't look so messy for the string values.  The effect is exactly the same.  Space-1 comes before 20 because space comes before 2.

What I cannot understand is whey 10101010 is failing to sort correctly if you are padding to 15.  You should have 7 spaces in front so that it sorts corrently.  Are you sure none of those are Is or Os?

Ged Byrne
Monday, December 22, 2003

Ged,
Thank you very much for explaining this. I think I get it.
But here is the problem I have.
I have the values for cust_num with field length 15 as {1, 2, abc1, dsfd, 3, 10101010}. They are all 1's and 0's.

if I use
right(replicate(' ', n) + woundnumber, n)
where n = 2 to 4
then I get the output as

1
2
3
10101010
abc1
dsfd

which is what I want.

But if I use
right(replicate(' ', n) + woundnumber, n)
where n = 5 to 15
then I get the output as

1
2
3
abc1
dsfd
10101010

This is bit confusing to me. Any thoughts?

Anon
Monday, December 22, 2003

The output [using right(...) with n=15] as:
1
2
3
abc1
dsfd
10101010

IS correct. The value '10101010' is correctly sorted after '    abc1' (with leading space characters). This also seems logically correct to me - why would you want '10101010' to be ordered before 'abc1'?

The reason that the order changes using n=4 is that this truncates the values to ('...1', '...2', '...3', '1010', 'abc1', 'dsfd') [where '.' represents a space character] and '1010' comes before 'abc1' in text ordering.

Philip Dickerson
Monday, December 22, 2003

Oops,
my mistake. Should have looked into it bit more before posting the previous post.
I have to use n=15(field length), if i use any other value for n it's going to truncate the field values depending on thier length. So that part is clear to me now.
I want to have all the numbers sorted first and listed together and then the alphabets. Hence the desired output is
1
2
3
10101010
abc1
dsfd

Once again, thanks for all the help.

Anon
Monday, December 22, 2003

If you want all the numeric values first, you can use a UNION with 2 separate queries, similar to this:

SELECT myval,
' ' + RIGHT(REPLICATE(' ', 15) + myval, 15) AS X
FROM MYTABLE
WHERE ISNUMERIC(myval) = 1
UNION ALL
SELECT myval,
'Z' + RIGHT(REPLICATE(' ', 15) + myval, 15) AS X
FROM MYTABLE
WHERE ISNUMERIC(myval) = 0
ORDER BY X

Note that the expression returned as 'X' is used only for ordering of the entire result set; also, the expression 'X' has either a space character or a 'Z' character prepended depending on whether it's numeric or non-numeric.

Philip Dickerson
Monday, December 22, 2003

Doh.  I knew that.

We are also padding the words, so they are no longer being sorted based on there first letter.

We only want to pad the numbers, and leave the text alone.

select
  case isnumeric(userid)
    when 1 then right(replicate('-', 15)+ customer_num, 15)
    when 0 then customer_num
  end
from table

Check books online for more details about case and isnumeric.

Another solution exploits the fact that isnumeric returns 1 or 0.

right(replicate(' ', 15 * isnumeric(Customer_num) + Customer_num, 15)

When it is non numeric 15 * 0 = 0 so there will be no padding.

Ged Byrne
Monday, December 22, 2003

Note that this may not give you the order that you want for the non-numeric entries, so you may want to change the query to (not padding the text entries with leading spaces):

SELECT myval,
' ' + RIGHT(REPLICATE(' ', 15) + myval, 15) AS X
FROM MYTABLE
WHERE ISNUMERIC(myval) = 1
UNION ALL
SELECT myval,
'Z' + myval AS X
FROM MYTABLE
WHERE ISNUMERIC(myval) = 0
ORDER BY X

Philip Dickerson
Monday, December 22, 2003

Leap frog!

Ged Byrne
Monday, December 22, 2003

The proposed SQL queries are a band-aid solution to more fundamental problems of data design.

Firstly, the fact that characters are valid input for the customer means that "customer number" is not an appropriate attribute name.

Secondly, what possible business value is achieved by producing a list/report/query in ascending sequence of customer ID? Look further at this requirement ... it may be that producing the list in alphabetic sequence of customer NAME would be a much better solution.

Finally, rather than butchering the collating sequence for the DBMS, or hacking awful SQL queries, if you truly need to produce the list in alphabetic sequence of customer ID, then consider transforming the code into an appropriate number at time of entry.

Fix the problem PROPERLY and future developers / maintainers will love you for it.  Clutter your database and code with SQL hacks like those shown above, and you'll be cursed for eternity.

HeWhoMustBeConfused
Monday, December 22, 2003

Ged, Philip Thanks. They do work. I was trying with unions  but was still having some problems. Philip, your solution helped me on that.
HeWhoMustBeConfused, I inherited this database design and I have to work with this. And this table(which is not exactly a table of customers) already has a "customerID" and this "customer_num" is entered by the users and it can not be only numeric.
Now I have to figure one more thing as far as this queries are concerned. I just realized the value set is more like
{1, 1a, 2, 10101010, abc1, dsfd} and these queries so far give me the output of
1
2
10101010
1a
abc1
dsfd

I need to get the output as
1
1a
2
10101010
abc1
dsfd

So I have to look more into that. But thank you very very much for your help. All you posts helped me  a lot in moving in the right direction. Thank you once again.

Anon
Monday, December 22, 2003

I understand your problem of maintaining an older application.

You have a problem of logic. I can't think of a collating sequence or algorithm that will sort 1->1a->3->11->30a->300.

Why is the data required in this sequence? Are these realistic data values, or are we actually talking about a semi-structured code that requires some interpretation?

HeWhoMustBeConfused
Monday, December 22, 2003

These values are entered by the users and it can not be limted to numbers only. And I have to sort the entire dataset based on this field.
I have the same doubt about sorting a character field such that I get the output 1, 1a, 2, 10101010, abc1, dsfd (if it is possible or not). I will have to look more into that.
Thanks.

Anon
Monday, December 22, 2003

Anon, the question I am asking, perhaps not too well, is WHY do you have to sort on this field?

I'm perfectly clear that you can't control the data entry, given the limitations of the system design. Even without that limitation, there is a data design principle that an identifier (CUST_NUM in your case) should be totally free of meaning, with only uniqueness (it must not be the same as any other row) and perpetuity (it must never change) as its attributes.

If these are realistic data values, then there seems to be no intrinsic meaning in the data. It doesn't give you the sequence in which they were entered, it doesn't give you a broad grouping, and it doesn't appear related to any other value.

What is the purpose of this sort? Is it for reporting? If so, that report may be practically useless in this sequence, and may be more suitable in another format. Is it for a match/merge? If so, an alternate merge/update design might be more appropriate.

If you tell us more about the REAL problem we might be able to give better advice.

HeWhoMustBeConfused
Tuesday, December 23, 2003

Anon,

You seem to be 90% of the way there.  The positioning of 1a seems to be somewhat arbitary.  I'm suprised it isn't 'good enough,' which is what you usually have to accept in these circumstances.

Why is the order required so strict?

Ged Byrne
Tuesday, December 23, 2003

HeWhoMustBeConfused and Ged,
Let me give you a better analogy than this "customer_num", so that I can explain you why this sorting has to be done this way.
I have this set of patients with many cases per patient. I have the "tbl_Patient" with PatientID and all the other data. Then I have this tbl_case with "caseID" which has a"case_num" field. CaseID is unique, generated by database and is numeric only. But "case_num" is entered by the user(nurses, physicians etc) which is alphanumeric.
I have to sort on this "case_num" because this is the unique field which our users refer to. And different users have different format of "case_num".
This sorting is for both reporting as well as just the presentation of data.
I hope I have been able to explain the situation better with this. Thanks for all your help again.

Anon
Tuesday, December 23, 2003

It sounds like the union takes care of ordering the second half of the list (those entries that begin with alphas) perfectly.

But I think you're going to need something more procedural for the numeric values. In addition to sorting 1a before 2, I'm assuming you want 10a to sort after 10 (not before 2 as it would if you treated these values as alphas).

In that case, it feels like you'll have to "peel off" the opening numerics and order those first. For instance, {1, 2, 10, 1a, 10a, 10b} becomes {1, 2, 10, 1, 10, 10). Sort those as numerics to {1, 1, 2, 10, 10, 10}, then add on the suffixes in alpha order to get {1, 1a, 2, 10, 10a, 10b}.

And I have no idea how you could do that in SQL.

Zahid
Tuesday, December 23, 2003

Zahid is right, I can't see any simple way to do this  in SQL.

Ged Byrne
Tuesday, December 23, 2003

Could something like

SELECT case_num
FROM tbl_case
ORDER BY CASE
WHEN PATINDEX('%[^0-9]%', case_num) = 1
    THEN '0000000000' + case_num
WHEN PATINDEX('%[^0-9]%', case_num) = 0
    THEN RIGHT('000000000' + case_num, 10)
ELSE RIGHT('000000000' + LEFT(case_num, PATINDEX('%[^0-9]%', case_num) - 1), 10) +
          SUBSTRING(case_num, PATINDEX('%[^0-9]%', case_num), LEN(case_num))
END

work? You want to replace the number of zeros and the 10's in this example to suit the maximum lenght of your numbers. The general idea is the numbers at the start are 'normalized' to lenght 10 by padding zeros onto them as much as nescessary. First case is for 'starts with a letter' second is for 'all numbers' and third is 'number followed by letters.
I don't have a SQL Server here to test at the moment, so could be all bollocks.

Just me (Sir to you)
Tuesday, December 23, 2003

Come on, this is easy:

select theField  from junk
order by case when isnumeric(theField)=1  then 0 else 1 end,
        case when isnumeric(theField)=1  then convert(int, theField) else 999999 end,
        theField

Tom Hathaway
Tuesday, December 23, 2003

Tom,

wouldn't that still put 1a after 2?

Just me (Sir to you)
Tuesday, December 23, 2003

I prefer Tom's method, but it doesn't produce the requested sort order. If it were me, and Tom's method produced a correct answer, even if not the preferred answer, I'd go with that.

If the sort order produced by Tom's code isn't acceptable, this script produces the requested order via a couple functions. It's probably not right, and there's probably an off-by-one in it, but I have work to do so this is probably as much effort as I'm going to put in to it:

create table #ids (i varchar(15))
insert #ids values('2')
insert #ids values('1')
insert #ids values('10101010')
insert #ids values('1a')
insert #ids values('abc1')
insert #ids values('dsfd')
insert #ids values('abc2')



SELECT  i AS i
      , dbo.left_of_chars(i) AS left_of_chars
      , dbo.right_of_chars(i) AS right_of_chars
FROM #ids
ORDER BY dbo.left_of_chars(i), dbo.right_of_chars(i)

ALTER FUNCTION left_of_chars (@string_in VARCHAR(15) )
RETURNS BIGINT
AS
BEGIN
  DECLARE @curpos    INT
  DECLARE @string_out VARCHAR(15)
  DECLARE @curchar    CHAR(1)

  SELECT @curpos = 0

  WHILE @curpos IS NOT NULL BEGIN
      SELECT @curpos = @curpos + 1
      IF @curpos > DATALENGTH( @string_in) BREAK

      SELECT @curchar = SUBSTRING( @string_in, @curpos, 1)
      IF ( ISNUMERIC( @curchar) = 1) BEGIN
        SELECT @string_out = COALESCE( @string_out, '') + @curchar
      END
      ELSE BEGIN
        BREAK
      END
  END
  IF @string_out IS NULL SET @string_out = '999999999999999'
  RETURN CONVERT(bigint, @string_out)
end

ALTER FUNCTION right_of_chars (@string_in VARCHAR(15) )
RETURNS BIGINT
AS
BEGIN
  DECLARE @curpos    INT
  DECLARE @string_out VARCHAR(15)
  DECLARE @curchar    CHAR(1)

  SELECT @curpos = DATALENGTH( @string_in) + 1

  WHILE @curpos IS NOT NULL BEGIN
      SELECT @curpos = @curpos - 1
      IF @curpos < 1 BREAK

      SELECT @curchar = SUBSTRING( @string_in, @curpos, 1)
      IF ( ISNUMERIC( @curchar) = 1) BEGIN
        SELECT @string_out = COALESCE( @string_out, '') + @curchar
      END
      ELSE BEGIN
        BREAK
      END
  END
  IF @string_out IS NULL SET @string_out = '999999999999999'
  RETURN @string_out
end

Troy King
Tuesday, December 23, 2003

If trying my stab at it, change the ALTER FUNCTIONs to CREATE FUNCTIONs. It would suck to overwrite real functions with those names.

Troy King
Tuesday, December 23, 2003

Tom's method (a few entries above) may not correctly handle the mixed number-alpha entries (such as '1a'). Just me's method (a very good approach) should work if you change the first case to NOT prefix it with '0000000000' (to allow the entries that start with alpha characters to sort in the correct position).

Here's another possible approach (that idnetifies the leading numeric portion of the field and sorts by that first):

SELECT case_num,
    CASE
    WHEN ISNUMERIC(SUBSTRING(case_num, 1, 15)) = 1
    THEN CAST(SUBSTRING(case_num, 1, 15) AS money)
    WHEN ISNUMERIC(SUBSTRING(case_num, 1, 14)) = 1
    THEN CAST(SUBSTRING(case_num, 1, 14) AS money)
[.....]
    WHEN ISNUMERIC(SUBSTRING(case_num, 1, 1)) = 1
    THEN CAST(SUBSTRING(case_num, 1, 1) AS money)
    ELSE CAST(900999999999999 AS money)
    END AS NUM_PART
FROM tbl_case
ORDER BY NUM_PART, case_num

Note there are some restrictions because of the maximum value of a money data type (+922,337,203,685,477) - this won't work if the field length is greater than 15, and will sort incorrectly any actual numeric entries that are greater than 900999999999999.

Philip Dickerson
Tuesday, December 23, 2003

Just an aside as to why the hell you'd ever want to do this (and agreed something like Customer ID probably isn't a great example).

If you have a system which allows alphanumeric document numbers, invoices, order numbers etc, etc and you ship that software to different customers with different requirements, some will want to use purely numeric numbers and others will want some combination you can't forsee.  In this case you need the padding tricks just to get sane output for everyone.

Even within a single organisation's database they'll have supplier documents with varying numbering systems, its useful to see them in supplier's order as well as your own sometimes, especially when trying to track down missing documentation.

Simon Lucy
Tuesday, December 23, 2003

Thank you for all your responses.
Just Me(Sir to you),
Your solution works. I still needs to test with more cases. Only thing I had to change was to pad with 9's instead of 0's for the first case.
Tom, like Just me(Sir to you) mentioned it doesn't take care of the "1a" situation. I will look more into it.
Troy, I am yet to try what you suggested.
Thank you again.

Anon
Tuesday, December 23, 2003

Oops,

Philiph 's right of course. Leave out the "00000..' in the first case and everything should be OK.

The check will be in the mail ;-)

Just me (Sir to you)
Tuesday, December 23, 2003

If I get my christmas bonus, why not :)
In all seriousness, you guys have been really helpful. This board is great!!
Thank you.

Anon
Tuesday, December 23, 2003

Thank Joel (or 'the elves'),

if he hadn't 'bolded' the topic I probably wouldn't even have looked at it.

Just me (Sir to you)
Tuesday, December 23, 2003

What Philip's said, though that's what I would do though, I do admir those who grok collation enough to recommend it.

Li-fan Chen
Tuesday, December 23, 2003

I believe SQL Server 2000 lets you specify collation on the collum level, and even should you not be able to do this (if your app does not own the data definition) on the SELECT level.

Just me (Sir to you)
Friday, December 26, 2003

*  Recent Topics

*  Fog Creek Home