Fog Creek Software
Discussion Board




Larry Wall half-wrong, all-wrong or right?

apropos:

"Internet Explorer [takes imperfect input and makes a best guess at rendering it correctly], proving, I think, the point that Larry Wall's quote about "be strict in what you emit and liberal in what you accept" is quite frankly not a good engineering principle."

http://www.joelonsoftware.com/articles/Unicode.html

As I understand it, Joel posits that a too-liberal policy can cause your program to do the wrong thing (e.g. mistakenly guessing that a Bulgarian page is Korean) and implies that cutting others too much slack (e.g. rendering a HTML page that doesn't have a content-type meta tag) encourages people to be careless with the data they give you.

But is he right? Namely, in The Real World, you might^H^H^H will recieve messy input. Isn't "good engineering" about doing what works best -- i.e. pragmatism and not idealism? Maybe Joel meant "is quite frankly not a good *theoretical* principle"?

Joe
http://www.joegrossberg.com

Joe Grossberg
Monday, October 13, 2003

I think that it's just not black and white - be liberal about what you accept but don't do it quietly if there's something wrong.  As an example: in the IE case they could put an icon on the taskbar to say that something's not quite right which would give more details if the user clicked on it.

R1ch
Monday, October 13, 2003

On the other hand, if broken pages diddnt render at all, people would be forced to write well formated HTML, and theyd know when they managed to do so.
As it is now, pages can look alot different in different browsers simply because they are guessing differently.

Eric DeBois
Monday, October 13, 2003

I think XML vs. HTML is the ultimate proof of this. Almost all XML parsers are really, really, pedantic, mostly refusing to do anything if there's any mistake at all in the XML. This strictness means that about 99.9% of XML documents are correct, while about 0.1% of HTML documents are correct. As a result parsing XML is trivial while parsing arbitrary HTML is practically impossible (not only do you have to be "liberal" to parse HTML but you have to be liberal in exactly the same way as the popular web browsers, which themselves are not consistently liberal).

Joel Spolsky
Monday, October 13, 2003

Joel is not being original here when he says that the Larry Wall principle applied to web design has led to a load of horrors.

The problem is of course that of the penalty of being the early adopter. If everybody stuck to strict standards in web design it would be a lot easier for everybody, but the guy who produces a browser that doesn't show loads of malformed pages is going to find nobody wants it.

Stephen Jones
Monday, October 13, 2003

Pedantic note:

Isn't this Postel's Law, not Larry Wall's?

Portabella
Monday, October 13, 2003

Yes, it's a Postel quote from the IETF.

Brad Wilson (dotnetguy.techieswithcats.com)
Monday, October 13, 2003

"My webpage isn't rendering right in XXX browser!" -- well, that's your fault then; you're not being strict in what you emit.

Arguably, rendering a page half-correct is superior to not rendering it at all.

Alyosha`
Monday, October 13, 2003

I think you've missed the point. the problem is that defective web pages render perfectly in browsers because those browsers are tolerant, and then fail miserably when the browser version is updated.

Worst of all, as Joel says, you can never count on an HTML web page as having anything it should do.

Stephen Jones
Monday, October 13, 2003

Actually, it *is* Larry Wall's quote not Jon Postel's.  However, the idea in this form probably originated with Postel.

Postel said in RFC 791 "In general, an implementation must be conservative in its sending behavior, and liberal in its receiving behavior."  http://www.ietf.org/rfc/rfc0791.txt

Larry Wall said in his 1998 2nd annual state of the onion speech, "People understand instinctively that the best way for computer programs to communicate with each other is for each of the them to be strict in what they emit, and liberal in what they accept."  http://www.perl.com/pace/pub/perldocs/1998/08/show/onion.html

Jonathan Scott Duff
Monday, October 13, 2003

Which is a perfectly valid engineering virtue.

JPEG is (ok: was) a standard for images. If some error in the source chunks was encountered, most JPEG viewers did render a portion of the image, and either blanking the rest out, or making a guess. I don't know why people are nitpicking about browsers here.

Johnny Bravo
Monday, October 13, 2003

"[T]he problem is that defective web pages render perfectly in browsers because those browsers are tolerant, and then fail miserably when the browser version is updated."

And so the problem is ... ?

"Worst of all, as Joel says, you can never count on an HTML web page as having anything it should do."

Writing robust code is tough.  *shrug*  Cope.

Alyosha`
Monday, October 13, 2003

Anything to extremes is bad. If browsers give too much leeway, than that is the problem. That doesn't mean that the principle doesn't apply ever, only that perhaps browsers are not the right place to apply it.

Anything to an extreme is too much, all generalizations are false, and every rule has an exception (except for this one).

Mike Swieton
Monday, October 13, 2003


As far as general engineering goes Wall and Postel are right.

Imagine if you had a bridge and it fell apart when somebody with an elecric car drove across - well that bridge was only designed to accept gas powered traffic!

Attempting to display broken HTML is a better solution for most users than to not display at all.

I'm sure most XML is better than a lot of the HTML out there, but you've got bugs in the various XML parsers/writers that mean you might not be getting well-formed XML either.

NathanJ
Monday, October 13, 2003

It might just reflect the politics.

For HTML, Netscape's goal was to have as many HTML documents as quickly as possible. I think the business reasons for this decision are entirely obvious, and now we all live with the consequences.

Business is *always* like this, as far as I can see.

Portabella
Monday, October 13, 2003

Mike: and even moderation should not be practiced to excess ...

Alyosha`
Monday, October 13, 2003

HTML -> Web Browser is not, in most cases, computers talking to computers.

It's human-written, unvalidated markup being passed to a computer program (in most cases).

Why we tolerate HTML-building software that generates invalid code, well that's another story.

However, it would be good if web browsers had their own internal validators and would complain (big "This page sucks" icon) on bad HTML.  Give the web authors incentive to clean up their act.

Richard Ponton
Monday, October 13, 2003

The problem Aloysha is twofold; firstly a lot of the web is buggy and so people like Joel trying to write applications to automate data collection from it run into problems.

Secondly; think - which would you rather do? debug once to a strict standard, or debug up to a dozen times because you have to check if he code's faults are accepted or not accepted by each browser.

(And another thing that has always irritated me is how even books on how to wiite for the web ignore best practises or standards. I remember writing all my tags in capitals and attributes in small letters because that is what my "How to HTML" book did (indeed most did). I was not amused at having to reduce them all to lower casie when I found out that was the XHTML standard!)

Stephen Jones
Monday, October 13, 2003

Larry's usually half right about most things. I can never figure out which half ;-)

If you want to watch one the best html renderer developers
as he wrestles Safari into compliance and non-compliance, check out Dave Hyatt's blog:

http://weblogs.mozillazine.org/hyatt/

fool for python
Monday, October 13, 2003

Funny you mention it. My web browser identifies badly formed web pages. In fact, I'm getting that indication right now. 29 errors on this page.

Here are some:

The attribute "LEFTMARGIN" is not allowed for the tag <BODY>
In tag <TEXTAREA> the value "soft" is not valid for attribute "WRAP".
In the tag <IMG> the value of the attribute "ALT" must be enclosed in quotes.

Joel's complaints along these lines are the pot calling the kettle black!!

In general, the problem is not that IE is too well written and able to correct errors in human-created input. The problem is that other browsers don't do that.

I think it's great that people can write their own web pages and publish them. And even if their results are not perfect, IE does it's best. That makes web publishing less a boring niche of he anal-compulsive elite nitpickers and more a colorful bazaar open to the masses, expressing themselves in any way they can.

Dennis Atkins
Monday, October 13, 2003

Some points:

The CSS standards people I've talked to agree that it's better to not implement a CSS property than to implement it badly, and CSS parsers are forbidden to "fix up" CSS they don't understand. Now, the "rule-based" structure of CSS is more amenable to this than HTML, but it's food for thought all the same.

Joel is right on about HTML parsers. There's no standard for error fixup in HTML, and each browser has built up emprical rules for doing it. In Mozilla, which I've seen, the result is a Big Ball of Mud that's pretty much undocumentable; I don't believe the situation is any different for other browsers. The reason IE appears to have better error handling is simply a function of how people tend to author web pages: throw HTML against the wall, browse in overwhelmingly-dominant browser. If it doesn't look right, tinker with it again; otherwise, go home. If a different browser had that kind of marketshare, people would avoid making webpages that it couldn't handle, and it would appear to have "better error handling".

The "HTML had to be broken before the Teeming Millions could accept it" argument comes up regularly, although this discussion page is a particularly ill-chosen example; we have a proprietary attribute in the textarea, and all the other errors seem to be violations of syntax constraints no more or less arbitrary than having to enclose one's tags in "<" and ">", which the creative types seem to be pretty good at, these days.  The latter part of http://groups.google.com/groups?selm=ja57usc28roobfkcc0ekinav1l50jg845t%404ax.com
is an interesting exploration of that argument. Whether the lower barrier of entry to "tag-soup" justified the damage it did to the WWW as an information system is, I think, still an open question.

Chris Hoess
Tuesday, October 14, 2003

Part of the problem as it relates to HTML is because of the browser wars.  Each browser needed to render as many pages as it could, and if that meant accepting bad HTML, then so be it.  And this lax attitude found its way into the books of the time: O'Reilly's "HTML The Definitive Guide" (3rd ed) says that </td> and </th> can be omitted.  I didn't even know there was a <tbody> tag until I started using the DOM, and by that time, how many non-compliant tables had I created?  I couldn't even guess.
The liberal/strict policy is fine, but it can break down when the liberal acceptance becomes the defacto standard for output.
Compilers should give warnings.  SMTP servers that warn you when you're doing something non-standard are great.  And I wish IE would tell people their pages suck.

Brian
Tuesday, October 14, 2003

> HTML had to be broken before the Teeming Millions could accept it

Well, I for one am not suggesting that, only pointing out that for Netscape widespread acceptance of HTML was miles ahead of *any* technical considerations.

Maybe it could have been just as successful, or even *more* successful if it was less broken. Realistically, we'll never know.

Portabella
Tuesday, October 14, 2003

*  Recent Topics

*  Fog Creek Home