Fog Creek Software
Discussion Board




AVL Trees vs. Red-Black Trees?

Hi, I worked through the chapter in Introduction to Algorithms by Cormen et al on red-black trees. One of the problems at the end discusses AVL trees.

Which, overall, is better to use and why? I didn't see much comparison between the two, and a Google search didn't yield anything satisfying.

If I'm missing something very obvious, I apologize.

Warren Henning
Tuesday, December 17, 2002

I'm not sure that it'll be much info for you but trees in Java Collections are implemented using red-black trees. Which probably doesn't mean a lot.

Evgeny Goldin
Tuesday, December 17, 2002

So are std::map and set::set in STL.

Ivan-Assen Ivanov
Tuesday, December 17, 2002

Creating a tree which balances nodes will tend to improve the search time for any random term within the tree.

So the red-black tree is meant to provide a O(log n) time.  This is true so long as it remains balanced, and is true only from the root of the tree.

Any insertions, unless also balanced will tilt the tree, or sub tree.  To re-balance the tree is supposed to also take O(log n) time, which I find dubious or at any rate counter intuitive.

Regardless of that, a red-black tree's performance will degrade depending upon the degree of insertion and rotation about a node takes place so I would not use it for a dynamic index.  In situations such as a view of fixed data, or a search to fixed data it would be the optimum layout.

Simon Lucy
Tuesday, December 17, 2002

RB-Trees are, as well as AVL trees, self-balancing. Both of them provide O(log n) lookup and insertion performance.
The difference is that RB-Trees guarantee O(1) rotations per insert operation. That is what actually costs performance in real implementations.
Simplified, RB-Trees gain this advantage from conceptually being 2-3 trees without carrying around the overhead of dynamic node structures. Physically RB-Trees are implemented as binary trees, the red/black-flags simulate 2-3 behaviour.

Alex
Tuesday, December 17, 2002

It's been a long time since any data structure lectures, but my understanding is that in an AVL tree the difference between the shortest and longest path to from the root to any leaf is at most one.  In a red-black tree the difference can be a factor of 2.

Both of these give O(log n) for look up, but balancing an AVL tree can require O(log n) rotations, whilst a red black tree will take at most two rotations to bring it into balance (though it may have to examine O(log n) nodes to decide where the rotations are necessary).  The rotations themselves are O(1) operations since you are just moving pointers around.

You might also want to look up 2-3-4 trees, or more generally B trees for more balanced tree data structures.

Rob Walker
Tuesday, December 17, 2002

Actually, AVL trees only guarantee that, for each node in the tree, the heights of its subtrees differ by at most one. Since the height is defined by the longest path to a leaf, it makes no guarantees about the ratio between the shortest path to a leaf and the longest. In fact, it is possible to generate a tree with the AVL property whose whose shortest/longest path ratio is arbitrarily bad.

In the worst case, the constant factors for search can be much worse for an AVL tree than a comparably sized RB Tree. On the other hand, AVL trees thend to be simpler to implement.

Devil's Advocate
Tuesday, December 17, 2002

"Since the height is defined by the longest path to a leaf, it makes no guarantees about the ratio between the shortest path to a leaf and the longest. In fact, it is possible to generate a tree with the AVL property whose whose shortest/longest path ratio is arbitrarily bad."

Actually, I think I am wrong about this part. At the very least, I want to think about it more before I feel comfortable asserting it.

Devil's Advocate
Tuesday, December 17, 2002

If you require a balanced tree type data structure I recommend taking a look at Splay trees.  Splay trees are cool because they are relatively easy to implement, and they don't require storing any additional information at each node.

They offer O(logN) amortised performance.  This means that a single action (insert/update/delete) may require more than O(logN) operations, but in the long run the average number of operations per action is O(logN).

You could say that Splay trees adapt to the input data, which I think is pretty neat.

Peter McKenzie
Tuesday, December 17, 2002

In fact, we need not compare btw. the longest and the shortest path of a tree to measure its performance. What we need, instead, is the tree's longest path, or height, relative to the number of its elements, say N.

We all know that a perfectly balanced tree has (log N) height. It may be optimal to search, but is however too rigid to use if we wanted to insert or delete an item.

A Red-Black tree has 2(log N) height, considering the fact that its longest path can be at most twice as its shortest path.

An AVL has 1.44(log N) height. This can be proved by successively comparing two subtrees up to root, thus getting the relation btw. its element count and its height.

As we can see, AVL tends to be more balanced, therefore more rigid, than Red-Black: during the worst case of deletion, we might encounter O(log N) rotations in AVL but only 3 in Red-Black. Searching for an item, or inserting an item, has the same complexity, O(log N).

Of course, the is trade-offs: AVL is much simpler to understand and/or to implement.

Junghyun Chae
Sunday, March 07, 2004

I forgot to add: by definition, every AVL is therefore subsets  of Red-Black. One should be able to color any AVL tree, without restructuring or rotation, to transform it into a Red-Black tree.

Junghyun Chae
Sunday, March 07, 2004

"Searching for an item, or inserting an item, has the same complexity, O(log N)."

Please forgive me for this mistake. Only searching has a complexity of O(log N). Inserting for both trees has a constant complexity, say O(1).

Junghyun Chae
Sunday, March 07, 2004

What is the difference between AVL and RB-Trees? It seems as though AVLs use a balance factor and RB-Tree uses color to tell if the child is out of ballance? I don know.

Shack Man
Wednesday, March 24, 2004

Search time is logarithmic for both.  The only possible advantage is that with a red-black tree, there will be a slightly lower amount of rotations needed to maintain balance. 

Greg Holl
Wednesday, March 24, 2004

Just making it clear:

AVL trees have average height 1.44 * log(N+2) - 0.33
Red Black trees have worst case height of 2.00* log(N+1)

Red Black & AVL trees both have O(log N) search, insert and delete, min, max, previous, next in terms of nodes.

Somebody said insert was O(1). It is not, as it still take O(log N) steps to find the correct place to insert. What is O(1) for Red Black insert, is any rebalancing necessary to restore the trees properties. At most  a double left or right rotation is needed. For a AVL tree, rebalancing is O(log N). This is where a RB tree gains over AVL. There is less rebalancing of nodes after an insert.

It is swings and roundabouts. RB trees have faster inserts than AVL trees but at the expense of slower searches.

Stephen Howe
Monday, June 21, 2004

The reason it matters that red-black trees require O(1) rebalancing operations (vs. O(log n) for AVL) comes up in the application to persistent data structures (Sleator, Tarjan, et al.) A persistent data structure makes it possible to roll back any algorithm to a previous step. To do so, it must keep a trail of updates. Any persistent data structure based on an AVL tree is going to have to store O(log n) times as many updates as one based on a red-black tree.

For any given application, an AVL tree might be the way to go, but red-black trees were necessary to solve a theoretical problem. BTW, persistent data structures are themselves used to solve other algorithmic problems (e.g. planar point location, Sleator and Tarjan) so ultimately, the constant-rebalancing is necessary to improve the speed of certain algorithms.

The textbook treatment of red-black trees often hides the actual reason that many theorists prefer them, and make it seem like they're just trendier than AVL trees. But the application to persistence is the main theoretical justification for using them.

Paul Callahan
Wednesday, August 18, 2004

*  Recent Topics

*  Fog Creek Home