Fog Creek Software
Discussion Board

Welcome! and rules

Joel on Software

Regex Question

I have text like the following:

begin 1
begin 2
end 2
begin 3
end 3
end 1

I want to generate HTML DIV's and obey the nesting, so I need to match on the correct end tag.  I guess what I'm looking for is the ability to use $1 in the match, like:

text = Regex.Replace(text, @"begin (\d{1,5})(.*?)end $1", "<div>$2</div>");

The idea is that the match on the "end" part has knowledge about the match on the "begin" part.

Thursday, March 18, 2004

As far as I know you cannot do that with a RegExp.

In the end, a regular expression is a representation of what is called a 'context free grammar', precisely because it cannot be aware of the context in which matches are made.

I may be wrong, but if that is the case, pleas someone enlight me to go back to school and sue my languages and automata's teacher.

I think you will have to build a small parser that uses a stack to keep track of the level of nested constructs it has found.

.NET Developer
Friday, March 19, 2004

This is similar to the classic problem of matching balanced parentheses -- such as matching the parenthetical groups in "( () ( () ) )."

.NET Developer is right that you normally can't do this with regexes.  However, the .Net Framework regex library adds support for this, which it calls a balancing group.  This should allow you to match something like





You could also capture the trailing numbers for each group.

The MSDN documentation on this feature is pretty sketchy, but it's covered in Dan Appleman's ebook "Regular Expressions with .Net."  (Pp. 44- 50)

Robert Jacobson
Sunday, March 21, 2004

The anwer to your question is in, luckily, the sample chapter O'Reilly's RegEx book:

Duncan Smart
Monday, March 22, 2004

*  Recent Topics

*  Fog Creek Home