Fog Creek Software
Discussion Board




Welcome! and rules

Joel on Software

Getting HTML output from ASP.NET Page...

Anybody know of a way that I can get the HTML that is sent to the browser after an ASP.NET page has processed everything?  I've been looking at the Response object to see if there's anything on it that can output the buffer before it's flushed but haven't found anything yet.

FYI...it's for a utility I'm going to write against a home-rolled CMS so that I can spider a whole site and write out static .html pages.

Thanks in advance.

smallbiz
Monday, October 18, 2004

Found it...

reader = new StreamReader(System.Net.WebRequest.Create(page.Path).GetResponse().GetResponseStream());
string html = reader.readToEnd();

smallbiz
Monday, October 18, 2004

Hmm. It didn't even occur to me that such a thing would be possible. In the past I've solved that problem by either writing the capability into the CMS from the beginning, usually using a custom templating system that spits out a chunk of HTML which can be either sent to the browser or written to disk, or by using wget.

comp.lang.c refugee
Tuesday, October 19, 2004

I also use a template driven system but it behaves a little differently that what you're describing.  A developer basically builds .aspx templates and then I provide a few different webcontrols (html content, links, documents, etc.) that they can drag/drop onto their template.  At runtime, a page request (http://www.example.com/page.html) is intercepted and translated to a request for the page's template (url rewrite to http://www.example.com/Templates/MyTemplate.aspx?pageGuid=XXX) and all the CMS controls fill themselves according to the pageGuid.  Since the developer can have other dynamic things going on inside the template (like dynamic menu building, etc.) I need to actually have asp.net process the page so I can see what it'll actually render as to the browser.  Once I have that I can just dump it to an .html file.

Wrote the utility to do this yesterday and it's working really well.  I have about a two hundred page test site and it dumped everything to .html files in about five seconds.  Adds a nice feature to my product that you can deploy your sites to Apache/Unix if you want to.

smallbiz
Tuesday, October 19, 2004

There's essentially two ways to do this. The first is the way you describe - use WebRequest and act like a browser.

The second would be to write a custom HttpModule that sits at the end of the response pipeline and capture it there. This is a bit harder to do; for what you want (spidering a site) I'd recommend the first method.

Chris Tavares
Tuesday, October 19, 2004

Chris, would you mind describing how I can capture the output in an HttpModule?  I already have one built to intercept incoming requests but I couldn't figure out how to get the HTML sent back to the browser in one of the HttpModule's event handlers (like EndRequest).

I am doing it the first way you mentioned and it works great but I'm just curious as to how you'd get the second way to work as I couldn't figure out how to do that.

smallbiz
Wednesday, October 20, 2004

Look into Response.Filter - you can attach your own Stream here, through which all the output goes.

Duncan Smart
Monday, October 25, 2004

*  Recent Topics

*  Fog Creek Home