Fog Creek Software
g
Discussion Board




Validating HTML... in bulk

So I just converted my entire site from broken HTML into XHTML 1.0 Strict. On my test pages everything was fine. However, once the site was automatically regenerated by integrating the original web site content into the new templates, all sorts of evil invalid HTML started creeping in.

I've looked around the web for HTML validation tools, and there are plenty of ways to validate one page. But I'd like to do a clean sweep of my site, so I can mop up any invalid stragglers that are still around. I can't find any way of "bulk" validating a site.

The only solution I'm looking at right now is writing a program that will use the Tidy DLL to run through a local copy of my website.

Question is, is there any service or tool out there I might have missed?

Joel Goodwin
Saturday, March 13, 2004

You didn't say what operating system (I'm assuming windows from the .dll reference), but under Linux you can use this:

http://www.htmlhelp.com/tools/validator/

It has a command line version that you should be able to script pretty easy.

I don't know if it can be compiled and made to run under windows. Perhaps through cygwin?

Sum Dum Gai
Saturday, March 13, 2004

http://arealvalidator.com/features.html

http://www.htmlvalidator.com/htmlval/whycseisbetter.html

http://www.flfsoft.com/html/html_validators.html

http://www.alliedtesting.com/Services/Link_checkers.htm

and more at

http://www.google.co.in/search?hl=en&ie=UTF-8&oe=UTF-8&q=website+html+validation+windows&btnG=Google+Search&meta=

KayJay
Saturday, March 13, 2004

Thanks for the comments. I'm using Windows (I have nothing against Linux, I just don't use it) and the site is being generated via City Desk.

It looks like that there's no tool out there which will do bulk validation for free =) I'm only looking for correctness versus the DTD, one of my goals in the beginning of the Great Overhaul of 2004. A lot of these tools seem over-engineered for that purpose.

Ah, well, I'll see what I can do.

Joel Goodwin
Saturday, March 13, 2004

cygwin ships with HTML tidy nowadays.  Does exactly what you're after.

Koz
Saturday, March 13, 2004

I've got the Windows build of HTML Tidy, and I was thinking of building a program around it to sweep through my site... however, I have discovered it is not a HTML validator in the sense of checking a document against the DTD.

For example, the tags <P> and </P> which are invalid under XHTML Strict are not picked up by HTML Tidy.

A Real Validator looks like the only option which does offline document correctness validation. (Another option - the online site validator at HTMLHelp.com, but is limited to 50 pages.)

Joel Goodwin
Saturday, March 13, 2004

http://www.arealvalidator.com/

The free trial version lets you do bulk html file validation (you just do a search on a folder for *.html in explorer and drah them all into the project window, then validate).

John C
Saturday, March 13, 2004

Since you're trying to validate XHTML, you could just run your pages through an XML validator, to check against the XHTML 1.0 Strict DTD.

David M. Cooke
Sunday, March 14, 2004

Thanks for all the comments, I'm using A Real Validator on trial for now but, yes, I should probably concentrate on an XML validator in future.

(that's all I was really interested in anyway, don't why I just didn't think of that before)

Joel Goodwin
Monday, March 15, 2004

*  Recent Topics

*  Fog Creek Home