Fog Creek Software
g
Discussion Board




Bubble sort reversed

Bubble sort is known to have a predictable N square iterations before the array is fully sorted. Here, N refers to the number of elements in the array.

The code from http://linux.wku.edu/~lamonml/algor/sort/bubble.html looks like this, as it would from anywhere else, as long as Bubble sort was the topic.


void bubbleSort(int numbers[], int array_size)
{
  int i, j, temp;

  for (i = (array_size - 1); i >= 0; i--)
  {
    for (j = 1; j <= i; j++)
    {
      if (numbers[j-1] > numbers[j])
      {
        temp = numbers[j-1];
        numbers[j-1] = numbers[j];
        numbers[j] = temp;
      }
    }
  }
}


We can see the outer loop traverses the array from back to the front N number of times and its function is not only to factor the inner loop N times but also to go backwards, so the array is sorted properly.

If I were to reverse the bubble sort, could I reverse the outerloop by forcing it to traverse forwards instead of backwards? Coupled with this, I would reverse the swap algorithm in the inner loop to check for a smaller value and push it leftward. Would that still work?

I am beginning to imagine voices, "Sathyaish, you're damn lazy. Try it yourself." But I've already written so many words here. It will a sin to erase them. I'm all set (to hit the button). Push, aeyee....

Sathyaish Chakravarthy
Saturday, February 28, 2004

Well, you've answered the question yourself.  Try it and see.

If you're doing much with sorting, check out more advanced algorithms like shell sort or insertion sort. Insertion sort in particular is a very efficient algorithm. If you really want speed and have a very large data set, take a look at quicksort.  Make sure you understand the shell or insertion algorithms first though, because quicksort tends to optimize itself by sorting small partitions via one of those.

Clay Dowling
Saturday, February 28, 2004

Yup! Thanks for the reply. I am aware of Quick sort, selection sort and insertion sort too. I've been using Quick sort in my programs actually. This was just Saturday time-pass.

Sathyaish Chakravarthy
Saturday, February 28, 2004

*It will a sin to erase them*

be be be be be

Sathyaish Chakravarthy
Saturday, February 28, 2004

Okay, I tried it straight first and it works.

[CODE]
Option Explicit

Public Sub Main()

Dim IntArr(0 To 9) As Integer

Call FillArray(IntArr)
Call BubbleSort(IntArr)
Call DisplayArray(IntArr)

End Sub

Public Sub FillArray(ByRef IntArr() As Integer)

Dim LngCounter As Long

Randomize (100)
For LngCounter = LBound(IntArr) To UBound(IntArr)
    IntArr(LngCounter) = Rnd * 100
Next LngCounter

End Sub

Public Sub BubbleSort(ByRef IntArr() As Integer)

Dim LngCounterOut As Long
Dim LngCounterIn As Long
Dim IntTemp As Integer

For LngCounterOut = UBound(IntArr) To LBound(IntArr) Step -1
    For LngCounterIn = LBound(IntArr) + 1 To UBound(IntArr) Step 1
        If IntArr(LngCounterIn - 1) > IntArr(LngCounterIn) Then
            IntTemp = IntArr(LngCounterIn)
            IntArr(LngCounterIn) = IntArr(LngCounterIn - 1)
            IntArr(LngCounterIn - 1) = IntTemp
        End If
Next LngCounterIn, LngCounterOut

End Sub

Public Sub DisplayArray(ByRef IntArr() As Integer)

Dim LngCounter As Long
Dim StrDisplayText As String

For LngCounter = LBound(IntArr) To UBound(IntArr)
    StrDisplayText = StrDisplayText & Str(IntArr(LngCounter)) & vbNewLine
Next LngCounter

MsgBox StrDisplayText

End Sub
[/CODE]

Now I'm gonna try it backwards and see.

Sathyaish Chakravarthy
Saturday, February 28, 2004

Eureka! This one works too.

[CODE]
Public Sub ReverseBubbleSort(ByRef IntArr() As Integer)

Dim LngCounterOut As Long
Dim LngCounterIn As Long
Dim IntTemp As Integer

For LngCounterOut = LBound(IntArr) To UBound(IntArr)
    For LngCounterIn = UBound(IntArr) - 1 To LBound(IntArr) Step -1
        If IntArr(LngCounterIn + 1) < IntArr(LngCounterIn) Then
            IntTemp = IntArr(LngCounterIn)
            IntArr(LngCounterIn) = IntArr(LngCounterIn + 1)
            IntArr(LngCounterIn + 1) = IntTemp
        End If
Next LngCounterIn, LngCounterOut

End Sub
[/CODE]

Thanks, Sathyaish! :)

Sathyaish Chakravarthy
Saturday, February 28, 2004

Let me guess... a US company outsourced some cheap coding to you - and that's exactly what you're providing...

x
Saturday, February 28, 2004

Hey, X guy, you are such a dumb ass to be assuming that a US company would outsource the reversal of a sorting algorithm.

Sathyaish Chakravarthy
Saturday, February 28, 2004

Just a quick note: Quicksort isn't too quick. The choice of name has done wonders for quicksort's reputation; I think it wouldn't have been half as a popular if it was called "recursive partition sort", which it should be.

Heap sort is simpler to code and generally faster with guaranteed O(n log n), unless you do your quicksort extremely well. Merge sort is often simpler to code, and requires some additional storage (but provides sort stability in return - a very useful feature).

Bubble / Insertion / Shell / Comb  sort might be interesting for 200 element lists sorted infrequently. For all other stuff, spend a little time and use heap or mergesort. Your users will thank you eventually.

Ori Berger
Saturday, February 28, 2004

"Quicksort isn't too quick."

Can you elaborate Ori?  Everything I've read on the subject indicates that Quicksort is the fastest of the fast, in the general case.  Admittedly, I never bothered to check these assertions, but I've seen data to back this claim up.  For example:

"The quick sort is by far the fastest of the common sorting algorithms. It's possible to write a special-purpose sorting algorithm that can beat the quick sort for some data sets, but for general-case sorting there isn't anything faster. " [1]

The same reference also states that, "The heap sort is the slowest of the O(n log n) sorting algorithms..." which seems contrary to what you recommend.  Maybe there are some new developments that I'm not aware of?

[1] http://linux.wku.edu/~lamonml/algor/sort/quick.html

Anyone remember Ultrasort??
Saturday, February 28, 2004

Sathyaish,

You might be interested in a great book, "Ready to Run Visual Basic Algorithms" by Rod Stephens.  It's a very readable introduction to complexity analysis, data structures and algorithms, with code snippets in VB6.  It has a chapter on the major sorting algorithms, and compares their various advantages and disadvanges.

Robert Jacobson
Saturday, February 28, 2004

Ori,

I'm going to suggest that your position is not backed by well-implemented code.  When I was teaching the subject, we implemented several common sorting algorithms and compared times on data sets of varying size.  Quicksort lived up to its name once we crossed into the thousands or tens of thousands.  Before that insertion sort was our best algorithm.

Merge sort is only applicable to certain situations. It's great when you want to combine a couple of sorted data sets, but not terribly appropriate otherwise.

Clay Dowling
Saturday, February 28, 2004

There's also the fact that QuickSort's worst case is still O(N^2).

Kevin
Saturday, February 28, 2004

Why on earth does anyone care at all about Bubblesort?

This is sort of like arguing which Hyundai we'll race in when
we have an unused Porsche and Lambourghini available.

I have a modified version of heapsort that I mostly use that
has two nice properties: 2*(n log n) speed and
it's an in-place sort - no additional memory required.  If
speed is an overriding factor and I have lots of memory
available (I work in embedded systems, so this is rarely
true),  I use a recursive implementation of mergesort:
it's n log n.  Quicksort has bad pessimal performance so I
don't like it - embedded systems have to have predictable performance, so heapsort is usually best.

x
Saturday, February 28, 2004

My experiments with 'simple' heap sort, 'simple' merge sort, and 'simple' quicksort were not too favorable to quicksort. If your data is randomly ordered, quicksort and merge sort are generally equal, with mergesort requiring more memory (~50%, definitely NOT negligible), and a little more time - often unnoticable on machines with caches, as mergesort  has perfectly sequential read and write access.

The problem with 'quicksort' on real world data is that it is often NOT random. Many quicksort algorithms, especially those in textbooks, reduce to a horrible O(n^2) when all values are equal (more generally, when the range of data is significantly smaller than the length  of the data). And ... guess what? That's often the case in real life. Also, most textbook quicksorts reduce to O(n^2) on sorted input. That _also_ happens quite often in practice. You can go horribly wrong with Quicksort; You can't go horribly wrong with either mergesort or heapsort - you're at O(n log n) at worst.

My results may stem from subjecting quicksort to what you might consider "non random" data, but it is (in my experience) more representative of the real world.

Furthermore, I've only met one (1) quicksort implementation that guaranteed never to become O(n^2); And it was horribly complex - because, among other things, it needs to select something close to the median in O(n) time. How many quicksort implementations have you used that do that?

Clay, I did my testing years ago; A non optimized heapsort was comparable to quicksort - not much faster, not much slower for most cases; Much slower for some cases. But if you can point me to a quicksort algorithm whose implementation you find worthy, I might find the time to give you a comparable heapsort (no promise - not enough free time these days).

As for mergesort - a recent variation developed by Tim Peters for the Python library seems to beat _every_ other comparison sort implementation hands down, both in theory and in practice. And it's stable to boot. The memory cost is 2*n bytes, where n is the list length. On most of today's system, this is very little to pay (For a 10 million element list, this means 20Megs of RAM  = your elements, unless they are integers, will probably take ten times as much).

Ori Berger
Sunday, February 29, 2004

When comparisons are expensive relative to item moves and overhead, merge sort tends to be a big winner. When they're cheap, a good quicksort can generally do about as well as anything. For Python's sort function, comparisons are expensive (in the worst case each comparison may have to call a function written in Python) and a sophisticated mergesort is clearly the way to go. Tim Peters's implementation is a very good one. (That last sentence is true pretty much whenever its referent actually exists.)

A naive quicksort can perform very badly on some plausible datasets. A well written quicksort will do well, in practice, on just about anything you throw at it, even though the worst case is embarrassing. There are three reasons why a good quicksort can heavily outperform a bad one: better pivoting (so that you more consistently choose a pivot near the middle of the data), tighter inner loops (so that the code in which quicksort spends most of its time is efficient), and falling back to something like insertion sort for short subarrays (so that overhead doesn't kill you).

The BSD quicksort implementation (described in a nice article in "Software Practice and Experience", called "Engineering a sort function" and written by Jon Bentley and Doug McIlroy) is a typical good one.

Those of you who are worried by quicksort's worst case might want to take a look at a paper called "A killer adversary for quicksort" -- I forget who wrote it, but I think Bentley might be one author -- which describes how to make just about any quicksort implementation take quadratic time. :-)

My limited experience is that a good quicksort outperforms a good heapsort by about a factor of 2. But I haven't done any testing of this stuff for a long time, and perhaps I'm out of date.

One nitpick about Python's (or any other) mergesort; the extra memory cost isn't 2n *bytes*, it's the same as the size of the input. (If each input item were large -- which isn't the case for Python, where they're all pointers -- then you could make the cost an extra n pointers instead, at the cost of a little extra complexity.)

Gareth McCaughan
Monday, March 1, 2004

Quicksort is a typical example of good marketing. As one other poster mentioned, I'm pretty sure that it would be far less popular if it didn't have Quick in its name.

Frederik Slijkerman
Monday, March 1, 2004

Oooh! I love these new topics like "Which sort is best!!!"  I don't think this has EVER been discussed before.


Jeez, just go get Knuth's "Searching and Sorting" and he'll break ya off ALL the knowledge you need to discuss this subject and NEVER post about it again!

Comparison-based sorts: O(n log n)
Radix-sort O (c * n) where c is the number of buckets

As far as an educational excercise, I was blown away at radix sort's simplicity, elegance and of course, speed.  Some smart folks come up with that stuff.

Russ
Monday, March 1, 2004

Gareth: Tim's merge sort needs exactly n/2 additional pointers (it's half out-of-place, half in-place). On a 32 bit system, that's 2*n bytes; On a 64 bit system, that would be 4*n bytes.

Russ: If only it were that simple, most of humanity as we know it would be out of work.

You _are_ aware that radix sort is only useful if your data has a representation that sorts lexicographically in a way compatible with the original ordering? And that the said 'c' constant may be very large?

Ori Berger
Monday, March 1, 2004

OrI: yes, that's why I carefully clarified my statement with "as far as an educational excercise" tag :-) Radix seems a good fit for sort things with fixed-sized sort keys (e.g. #'s) and in mutable lists.

"Very large" c?  I've only dealt with it where c was some multiple of the key size (<= 32)? is that "very large"?  In practice, that seems smaller than "log n" for a dataset where you even care about how long it takes to sort.

and for sure, just like quicksort and the like are not the one-size-fit-all sort, neither is radix, it's just something else to have in the toolbag.

Russ
Monday, March 1, 2004

D'oh. Ori, you're right: n/2 pointers, not n pointers. Mea culpa. Russ: no, *not* everything is in Knuth vol 2, wonderful though that is. For instance, the cunning wrinkles in Python's sophisticated merge sort aren't.

Gareth McCaughan
Tuesday, March 2, 2004

*  Recent Topics

*  Fog Creek Home