Fog Creek Software
Discussion Board

Linked List

(I never should have started reading these)

Q: How does one find a loop in a singly linked list in O(n) time using constant memory? you cannot modify the list in any way (and constant memory means the amount of memory required for the solution cannot be a function of n.)

A: One way to detect a loop is to iterate over the list with 2 pointers at the same time where one is iterating at double speed. If the 2 pointers are ever equal after they iterate once and before they both reach an end, there's a loop.

The question asks to "find" a loop.  The given answer will "detect" a loop.  This isn't the same thing in my mind.  Sure you know that a loop exists.  In fact, by the question I assumed it's given that a loop exists before I even started the problem.  The 'solution' does not actually FIND the loop!  By "finding" I mean to identify the first element to appear twice in the sequence.

William Frantz
Tuesday, July 23, 2002

Seems to me, you can easily find the starting point of a loop once you've determined that you exist within one. 

At the point you determine that you are within the loop, you have two (#1, #2) pointers within that loop pointing at the same item, correct?  Given that you can freeze one pointer as a static reference to a point on the loop, you can iterate through the one and only loop in your linked list by incrementing the 2nd pointer until it once again has the same value as the first.

As a finite memory solution to find the start of a loop within the original list, add a third pointer to the start of your original linked list, and increment that pointer #3 once per iteration of sweeping through the loop with pointer #2.

If #3 = #2 at any time, you've found the "start" (end?) of your loop... the item within the original linked list that is pointed to by the "last" item in the list.

Maintaining your focus on #3, you can have pointer #1 trail one link prior to #2 and iterate through the loop again.  When #3 = #2 again, #1 is the tail end of the linked list that was looped back (the last item of the list).

Strange ideas involving binary trees looping back are being spawned in my head, but I'm sure it's just the voices in my head being unruly and looking for something to do. :)

Stephen Hoffman
Wednesday, July 24, 2002

Yes, I thought about that, but it's an O(n^2) algorithm.  If the loop has n items in it, you will have to do n-1 comparisons and then n-2 comparisons and then n-3 comparisons...

Not efficient and I think the problem statement asked for an O(n) solution.

I don't think there is an O(n) solution for finding the start of the loop.  The given solution is an O(n) algorithm for detecting a loop, but there isn't an O(n) solution for actually finding the loop.

Somebody prove me wrong.

William Frantz
Friday, July 26, 2002

Its too early in the morning for thinking but I believe this algorithm returns the first node on the loop.

Let x = the head of the list
Let y = some point on the loop (e.g. the place we detected the loop)

pointers t, a=x, b=next(y), c=y

while (1) {
    t=midpoint (a,c)
    if find (b,t,c)
        then c=t
        else a=next(t)
    t= midpoint (b,c)
    if find (a,t,b)
        then c=t
        else b=next(t)

    if (a==b) return a;
    if (a==c) or (b==c) return c;

midpoint (e,f):
returns the pointer to the middle element (round toward front of list) --- we can implement this with a pointer+double speed pointer walk

find (e,f,g):
returns true if we encounter f in a walk from e to g, otherwise false.

Constant memory, O(n) time.
The key to seeing that we meet the time constraint is to note that after each iteration of the while loop both of our testing zones (a,c) and (b,c) are at least cut in half. This means that in each successive iteration of the while loop our calls to midpoint() and find() cost about half as much.

Alex Harris
Sunday, July 28, 2002

Sigh... knew it was too early. Sorry about the lack of formatting. Also there is a typo: the line that reads "if find (a,t,b)" should read "if find (a,t,c)"

Alex Harris
Sunday, July 28, 2002

"midpoint (e,f):
returns the pointer to the middle element (round
toward front of list) --- we can implement this with
a pointer+double speed pointer walk"

That's an O(n) algorithm.  If n is the number of elements from e to f then you are making n comparisons to find your midpoint as you walk from e to f.

You split the list in half, but you call midpoint() twice inside another O(n) loop.  So, you have an O(n) algorithm inside an O(n) loop resulting in O(n^2) overall performance.

I don't think there is anyway around this.  You are essentially attempting to find two matching elements in an unsorted list.  That requires the sum of 1..n comparisons.  Given that the list is not sorted, there's no efficient way to do it.  Every element you add to the list is going to almost double the number of required comparisons.

William Frantz
Tuesday, July 30, 2002

No. Each time we iterate the while loop,  the costs of the list walks (midpoint and find) are cut in half. The first iteration they are c * n. Next time they are c/2 * N, ..... In total thats just 2c *N over the course of the whole algorithm.

Alex Harris
Tuesday, July 30, 2002

OK, I get you.  I guess I need to review the definition of O(n) before I start criticizing.  It's been too long.  Anyway, re-writing your algorithm might look like this:

Let x = an element outside the loop
Let y = an element inside the loop

findloophead(x, y) {
  a = x
  b = y
  while (next(a) != b) {  /* while b is not loop head */
    t = midpoint(a, b)
    if (find(next(y), t, b)) {  /* if t is in the loop */
        b = t
        a = t

midpoint() and find() are O(n) themselves, but you keep reducing n by half each time.  Very nice.

Worst case scenario is that next(y) == next(x) which means next(x) is the loop head.  If the distance between x and y is n, then you'll have an n walk for the midpoint(), followed by a n/2 walk for the find(), then it repeats with a n/2 walk for the midpoint() and a n/4 walk for the find() and so on.

Note that this algorithm insists that x be an element outside the loop.  Of course you must have at least one element outside the loop in order for a "head of the loop" to be well defined.

It's also worth noting that midpoint() and detectloop() are very similar.  I could imagine a midpoint() function that returned an error if a loop was detected since midpoint() could work under that condition.  You might then kick the whole thing off with something like t = midpoint(head, NULL).

William Frantz
Wednesday, July 31, 2002

There is a problem with your rewritten version of my program. The interval you're doing the find over isn't decreasing geometrically the way it should be to maintain the O(n) running time.

N: total number of elements in list
K : total number of elements from head to y
h: the first element of the loop.

Every time you do "find(next(y),t,b)" that call is going to walk over the entire  section from next(y) to h (plus some more afterward of course) as we head toward next(y) and h are fixed locations and don't vary over the course of the algorithm and they are N-K nodes apart. In general N-K will be on the order of N,  and we'll execute this once for every pass, of which there will be O(log n).

Overall this makes the running time of that algorithm O(n log n).

I used the duel searching method for exactly this reason; we need to simultaneously reduce not only the "test range" but also the time it takes us to do each test, because each starts at O(n).

Alex Harris
Thursday, August 01, 2002

*  Recent Topics

*  Fog Creek Home