Fog Creek Software
Discussion Board




Putting your whole HTML page on one line

Joel mentions the book "Speed Up Your Site" which has the advice that putting your whole HTML page on one line will speed it up.

Isn't this an example of over-optimization? You trade maintainability for a small increase in performance.

I had the same reaction to Joel's search for a patch utility. If you offer a patch-upgrade, you now have to make it recognize all previous versions of the binary which you have shipped. As the product grows you have more and more binaries and previous versions that you have to patch correctly. All this to save a couple of minutes twice a year for those folks who don't have broadband.

Complexity comes in small doses and builds up quickly.

Nathan Silva
Wednesday, April 23, 2003

Maintainability and what's presented to end user browsers should be two completely different things: i.e. Code should run through a optimizer on route to your testing and then production servers. It's horribly, horribly wasteful how most HTML is tabbed, whitespaced, and often even commented, and will then be downloaded millions of times where such things are of absolutely zero relevance.

Dennis Forbes
Wednesday, April 23, 2003

If browsers are able to download compressed HTML file and then uncompress on the fly  it that would be better idea. You have less to download and more to view.
Most of the tweaking about HTML  has done to suit the browser needs and browsers are tweaked to provide enhanced user experience.. Less it done about how browser should be getting HTML data.  We have already good success for streaming and graphics (ex.. svgz) , if optimization is concerned, why not HTML.?

artist
Wednesday, April 23, 2003

Note that Joel does not put all his code on one line. In fact despite the low graphics the forum takes up loads of bandwidth.

We would certainly lose a teaching tool if the web pages were put online. I'm sure most of us learnt the tricks of the trade from reading the source code.

Incidentally, who on earth would comment HTML? A couple of lines before each complicated table for layout maybe?

Now obviously, if you are going to put it all on one line you simply run a program on properly set out code. If you could then have some kind of code to reset the line spacing out for others to see that would be public spirited in the extreme.

Anybody much idea about how much bandwidth you actually save. Are you sure that the line space wouldn't be compressed for transmission anyway?

Stephen Jones
Wednesday, April 23, 2003

Of course one should always use gzip on one's web pages - it makes a very noticable difference the amount of bandwidth one eats.

After you start using gzip, and extract your css and javascript from individual pages into their own cache-able files you'll find removing whitespace to be a relatively small concern.

Our pages are so rediculously small now for most of our readers that its not worth the time to consider other optimizations at this point.

Lou
Wednesday, April 23, 2003

Even if you have a utility to convert your HTML into one-line pages you still have added a layer of complexity.

You're just putting the complexity elsewhare: probably, a web server plugin which itself has to be maintained. It would be easy for this utility to have bugs like not correctly unwrapping a line of HTML which causes your output to be corrupted, etc.

On most pages graphics load much slower than text; you are optimizing the wrong thing. If you use gzip compression, an existing standard mentioned by a previous poster, it may even be counter-productive.

But this is just one example. There are many ways in which over-optimization leads to unnecessary complexity.

Nathan Silva
Wednesday, April 23, 2003

Firstly, the majority of websites don't use gzip for a variety of reasons (usually because they have dynamic content that is generated on the fly for each request, despite the fact that the generated content is actually static because the underlying data seldom changes, and they don't want the server computational load GZIPping, or their infrastructure won't handle it), though usually it's just laziness on the part of the administrator.

Secondly removing duplicitous information doesn't somehow make your page compress less : As a ratio, sure, but the end page will not be somehow smaller because you've packed it full of unnecessary information.

Thirdly, images and other content can't load until the base page has loaded: The base page is a critical section deadlocking the rest of the page. Loading the base HTML in 100ms versus 300ms is on the critical path and is a dependency of every other element of the web page (whereas compressing an image affects just the image itself and is more of a convenience factor).

Fourthly (phew), claims that HTML files are "tiny" and irrelevant isn't born out by fact, but is just a classic case of people claiming proof by repetition. Checking some major websites shows them coming in at from 40KB to 100KB just for the base HTML file, and most have from 30-50% of information that the browser discards. With a high speed connection it isn't that big of a deal, but it's that much quicker that a Slashdotting saturates your pipe.

Fifthly, who writes straight HTML anymore? Most HTML is generated from somewhere else and is just OUTPUT, it isn't the actual code. For some odd reason many programmers put in completely unnecessary code (which equals maintaince and reliability problems) to make indenting, "nice" looking HTML source, despite it being nothing but waste.

Dennis Forbes
Wednesday, April 23, 2003

I agree that mechanical page generation is the right answer. Use #ifdefs for debugging to have pretty code, and in release builds, no unnecessary whitespace.

Brad Wilson (dotnetguy.techieswithcats.com)
Wednesday, April 23, 2003

While I agree with the general line of thinking regarding automation of the code generation I think Dennis might have misunderstood (or wasn't referring to perhaps) my comment on removing duplicated code (CSS and Javascript) from each page and making them separately loaded files.  The advantage here is obvious - these files are cached by most browsers (barring user preferences set otherwise) - thus the user downloads less of a file for each page after the first.

And as for web pages being tiny - mine are - the corporate site I just redesigned has an average page weight of approximately 6Kb.  Of course the graphics are largely reused and referenced from the same images folder making them likely candidates for being cached by the browser as well.

Clearly this isn't an image heavy site but it has a relatively complex design aided by CSS.  When your files are this small, and you're dealing with static sites so you gzip - the result is that any other optimization is likely unnecessary though possible.

Though if you have a dynamic site and you're serving dynamic images or a wide variety of images you probably can't or shouldn't use gzip (if the cpu is already under load) but you should look to strip out your CSS and Javascript to allow those to be cached, and you should look into optimizing your images better and you should really see if your files are truely dynamic or just occasionally updated - in which case you can use a script to create an updated static file to be served, freeing the cpu/harddrive and making it possible to gzip.

There are many ways to optimize web sites - and one current trends includes stripping out 90% of meta tags - keywords and the like as they are largely ignored by search engines.  It all depends on the site the style of serving the content, and the goals.  Its a measure of returns versus investment - building a tool to strip out white spaces might not be the best use of your time (we're really fixating on this).  Look for the other opportunities as well and consider them as well.

Lou
Wednesday, April 23, 2003

Forgive my ignorance, but I am curious about this "gzip" approach being mentioned.  From the name, I assume it is some method of compression, but beyond that I don't understand.  Is the idea to compress each static page?  If so, do browsers expand these files on the client in order to recover the original HTML?  Or is it somehow possible to compress entire sub-webs into a single gzip file so that all the files in the archive can be pulled from the cache?  Again, sorry for being uninformed, but I've never heard of this.  Can anyone provide a link to more information?  Thanks.

gzip newbie
Wednesday, April 23, 2003

gzip newbie,

Check out this article on 15 seconds:

- http://www.15seconds.com/issue/020314.htm -

Dave B.
Wednesday, April 23, 2003

2 things:
1) everything on one line? easy enough. just write your PHP with no '\n's. :-)
2) >We would certainly lose a teaching tool if the web pages
>were put online. I'm sure most of us learnt the tricks of the
>trade from reading the source code.
Find & replace: '><' with '>\n<'

anomolous coward
Wednesday, April 23, 2003

This is a TOOL problem!

All HTML tools should optimize for deployment and format for editing. Anything less is a major bug.

Having said that, HTML would have been more dlowly adopted if it weren't nicely formatted in 'View Source", which again is a tool problem.

fool for python
Thursday, April 24, 2003

Perhaps CityDesk could be made to produce HTML files on one line automatically during the Publish command? That would be very useful.

Angus Glashier
Thursday, April 24, 2003

Jeeze, why bother.

This is like arguing over how many angels can meet new angels on the head of a pin.

Its entirely without point.

Simon Lucy
Thursday, April 24, 2003

Angus,

CityDesk already does produce HTML on one line, just go in and switch from HTML view to "Browser" view and all your formatting gets blown away.

But I think Joel has explained that before as a bug in IE 5 or 6.

Me!
Thursday, April 24, 2003

And another point.

These .Net ASPX pages now with the *HUGE* "VIEWSTATE" control in them.

As an example, some MSDN pages are 35KB and 15KB of that is just the VIEWSTATE of the ASP.Net page. All that bandwidth just so the server doesn't have to keep individual session state.

And the VIEWSTATE is fairly random so there is no guarantee that compression will help at all.

Me!
Thursday, April 24, 2003

I'm not sure anyone including the author was saying it was a good idea.  The source for the books web site http://www.websiteoptimization.com/ is terse but quite readable.

The site also has an free tool that tells you various things about the weight of a page including estimated download times at various connection speeds.

Ken McKinney
Thursday, April 24, 2003

I did a quick survey of MSN, Yahoo, Amazon, and JOS to see what effect stripping newlines would have.

MSN already had most newlines stripped.

JOS would see a 0.9% size reduction.
Yahoo a 1.3% reduction.
Amazon a 1.4% reduction.

Matt Christensen
Thursday, April 24, 2003

Yeah, stripping white space and new lines is the dumbest thing I've heard recently. "It's horribly, horribly wasteful how most HTML is tabbed, whitespaced, and often even commented". Puh-leaze. As Matt points out, the savings are very near zero.

This is even more annoying considering how image heavy most web pages are. If you want a simple way to speed up downloading and display, reduce the images.

pb
Thursday, April 24, 2003

I'm going to ignore the hyperbole of a particularly spiteful little monkey in the forum, and instead state that what we're talking about here is mechanically generated pages (which is what the majority of pages are nowadays): Putting formatting in the generated HTML is not a maintanance issue whatsoever, and is nothing but ill-informed, ignorant waste. Because of the this waste, many sites serve GBs or information daily that is simply discarded by the browser (indeed, that SLOWS the browser as it invariably filters it out anyways). Boy, what a win win: It not only slows the transfer, increases the computational time necessary if gzip is used (and will always lead to a larger resultant file), increases the site bandwidth, but it slows the rendering too. Oooh, where can I sign up for this new age technique! It's be "the dumbest thing I've heard" to go against something so clearly logical!

Regarding the metrics given, I grabbed a common optimization program (Advanced HTML Optimizer) and ran a copy of common sites through it. Note that many major sites are enlightened and aren't guilty of misunderstanding HTML (like many amateurs are, apparently), however here's the before and after of a couple of sites.

JOS - 40,227 - 37,054 - 7.8% savings
JOS Forum article list - 47,463 - 39,174 - 17.5% savings
News.com main page - 59,817 - 47,962 - 19.8% savings
Sun.com main page - 12,229 - 9,711 - 20.6% savings

Multiply each of these completely unnecessarily (and COUNTER-productive) bloated pages by tens or hundreds of thousands of hits. Yeah, no reason to improve those.

Dennis Forbes
Thursday, April 24, 2003

On the other hand, you can really get some savings by squeezing your javascript.  I was recently working on a very js-heavy web app, and part of the build process was running the javascript through an obfuscator, and one of the benefits was about a 60% size savings, and we could easily have done better with more work.  The output is truly marvelous:  almost 150k of javascript with 0 whitespace or comments and short, random variable names.

On the third hand, there are other much bigger factors affecting the speed of the app.  The reduced size of the js was a minor side benefit; it was the obfuscation that was the goal.

Brian
Thursday, April 24, 2003

There's no doubt the forum article list is wasteful w/ whitespace. There's also other things they could be doing:

- External CSS file

- Stop using nbsp's for spacing, when a simple CSS style on the listHeadline style would add the appropriate padding

- Stop using tables for layout

- Remove inline CSS styles (or, worse, things like bgColor and leftMargin)

I bet the savings would even be more substantial.

Brad Wilson (dotnetguy.techieswithcats.com)
Thursday, April 24, 2003

"External CSS file" -- but wouldn't this require an extra HTTP file request on every page? Even if the CSS isn't loaded the browser would have to check if the cached copy is current.  Unless the CSS is truly gigantic this sounds like it would take longer and create more traffic than just embedding the CSS.

"Don't use tables for formatting" -- not sure what you're referring to here. How would you format HTML text in tabular columns of any kind without using tables?

Chris Nahr
Friday, April 25, 2003

"JOS - 40,227 - 37,054 - 7.8% savings
JOS Forum article list - 47,463 - 39,174 - 17.5% savings
News.com main page - 59,817 - 47,962 - 19.8% savings
Sun.com main page - 12,229 - 9,711 - 20.6% savings"

If these savings are in kilobytes, i.e. in JOS less than 3K then the percentages really make no useful measurement.  Since Sun.com is also around 3K, actually less.

On the majority of sites the images (which might be on an entirely different server) are likely, as has been said, to be the heaviest usage of resources.  The savings of 3K, even on a page which isn't cached and is generated from some database and accumulates thousands of hits isn't going to make a jot of difference.  What would make a difference is having a server engine that caches requests and so doesn't cause redundant delivery.

Of course there is less reason for pointless formatting of generated pages, although even generated pages need to have their output inspected by human eyes at times. 

I don't particularly care about generated pages not having such formatting.  I am against stripping or so called optimising filters which remove formatting though, simply because that which was written is not that which is delivered.

Simon Lucy
Friday, April 25, 2003

The way browsers are configured by default, the external CSS would be hit (either downloaded or cache-checked) once, until you shut the browser down.

A tabular set of data would like, say, a spreadsheet. I don't see anything that looks like a spreadsheet here. Replicating this layout without tables is extremely trivial.

Brad Wilson (dotnetguy.techieswithcats.com)
Friday, April 25, 2003

3k x 25,000 hits/day x 365 days/year = ~ 27 megabytes saved

Now do the math for every website on the planet and tell me that's not significant. Tell me it's not significant to the guy who has to pay by the kilobyte to download, and every page here costs him 3 pointless kilobytes. Not everybody has "unmetered" access.

Brad Wilson (dotnetguy.techieswithcats.com)
Friday, April 25, 2003

Sorry, that was 27 gigabytes, not 27 megabyte. My bad fingers. :)

Brad Wilson (dotnetguy.techieswithcats.com)
Friday, April 25, 2003

Browser compatibility problems arise with using all formatting on CSS's.

Also, can you use relative formatting on a CSS? As Nielsen has pointed out the use of absolute formatting on web pages is one of the most irritating facets of the web at present.

If you specify the table width in percentages, does it really take a lot longer to render?

Stephen Jones
Friday, April 25, 2003

You do the world a huge favor when your site looks plain and undecorated in the 4.x browsers. These browsers are more than half a decade old. You keep browser stats on your site, right? 4.x is dead in all the stats I've seen.

If we don't support Windows 95 any more, why should we support 4.x browsers any more?

Brad Wilson (dotnetguy.techieswithcats.com)
Friday, April 25, 2003

Oh, and you can't appeal to authority with me by using Jakob's name. His site looks like it was made in 1993. That's what you want the web to be?

Brad Wilson (dotnetguy.techieswithcats.com)
Friday, April 25, 2003

Dear Brad,
                I don't care how old-fashioned the web page looks, but I do care about being able to read the print. And if you are using IE and have a fairly high resolution, such as 1280 x 1024 you will find the print on many websites quite unreadable.

Stephen Jones
Friday, April 25, 2003

How could you possibly conclude that someone's choice of a small typeface means CSS is evil? *boggle*

Get a browser with font size adjustments.

Brad Wilson (dotnetguy.techieswithcats.com)
Friday, April 25, 2003

Dear Brad,
                I am not sayiing that CSS is evil. I am saying that absolute formatting is.

                I went over to Netscape precisely because you can't adjust the font sizes for most CSS formatted pages on IE. I stick to it because of the tabbed browsing but there are plenty of people who can't be bothered to download it, or prefer IE for other reasons, who haven't got install rights at work, or who don't know that Netscape allows you to change the font size and IE doesn't.

                You don't know what the users display settings are (nor his preferences, or eyesight, or distance from the screen) so you shouldn't be deciding on the font size for him.

Stephen Jones
Friday, April 25, 2003

So, I must presume you think people who print magazines are evil for picking a font size when they print articles and decide on how much of the page certain things will take, right?

If not, then why not? It's an information presentation system much like HTML is, but one would argue even less flexible, since it isn't a trivial thing to increase or decrease the size of the text.

Some people have problems seeing things (I would argue they shouldn't be running 1600x1200, but hey, whatever makes you happy). That's why I use em-sizes for fonts instead of points, in recognition of IE's broken font behavior.

There doesn't seem to be much point in arguing this any farther. Your opinion is your opinion, and mine is mine. You will not convince me to dump CSS, so stop trying. *shrug*

Brad Wilson (dotnetguy.techieswithcats.com)
Friday, April 25, 2003

Oh, and for the record, I do use 1280x1024 on my primary display (and 1024x768 on my secondary; given their difference in sizes, that's approximately equal pixel size). I use Mozilla with its default text size (100%), and I _rarely_ run across a site that's unreadable.

Brad Wilson (dotnetguy.techieswithcats.com)
Friday, April 25, 2003

When you print something on paper you know how big the paper is going to be.

With a screen you have NO IDEA HOW BIG THE OUTPUT IS GOING TO BE.

Mozilla lets you adjust the font size even if its specified in points so pages will look OK as you will set up the default for this (as I said that is why I switched to Netscape).

The problem comes with IE; it's "change font size" won't work for pages with absolute formatting. You can set it to override the site's settings but that wrecks all the well set-up sites on the web.

Now if you don't use exact sizes you are avoiding what I mentioned and well done. I'm not saying CSS are bad: I use them myself.

As for ignoring 4x browsers, remember this site was set up in 2001. At that time the 4x browers were running at around 20% of all users. Indeed IE3 was still showing up on the radar screen.

Stephen Jones
Friday, April 25, 2003

Common. Most pages have lots of images, and the time saved by taking away the whitespaces will be insignificant.

Even if automatically generated, suppose three years later, the "source" has dissapeared, and you only have the actual html.

Btw, nice pun:

"This is like arguing over how many angels can meet new angels on the head of a pin.

Its entirely without point. "

Dimitri.
Friday, April 25, 2003

"Most pages have lots of images, and the time saved by taking away the whitespaces will be insignificant. "

What a defeatist attitude. That reminds me of a particularly incompetent individual at one former workplace who assured us that we might as well not bother building our system to be secure because "real hackers can get in anyways". Alrighty...

I gave a specific example in the Joel on Software Forum list where 8.3KB per hit is _ABSOLUTE_WASTE_ (so everytime someone hits F5 to see if there are replies there's 8.3KB down the drain...with only 1000 hits [I'm probably horribly underestimating] being wasted. Ignore the fact that there's inline CSS which is another several KB that should be linked in and loaded only once). How could you POSSIBLY justify that? There is no justification for it, apart from baseless defensive posturing. Images have an asthetic reason for existing, and secondly they are usually set with long expirations and are downloaded once per user (versus dynamic pages which are generally set to expire immediately), and anyways it's a red herring regardless: Of course you should optimize every aspect of your page, minimally using appropriately compressed images, using linked CSS sheets, etc. Throwing your hands up is ridiculous.

"Even if automatically generated, suppose three years later, the "source" has dissapeared, and you only have the actual html. "

This is ridiculous. HTML is, with small variations, an XML document: Use a bloody XML viewer and it'll display the document in whatever way you want, putting all the whitespace you want.

Dennis Forbes
Saturday, April 26, 2003

The page on JOS where you would get the big saving is the thread list. Most people are unlikely to be permanently refreshing it. Also there is the factor that the page is generated automatically.

I can't find any justification for not linking the style sheet instead of embedding (that doesn't mean there isn't one). It appears to be the same style sheet for all pages. The saving is small 2.43KB or less, but does go for every page.

Incidentally Joel does use my pet bane of absolute font sizes. At least the default is intelligent.

Stephen Jones
Saturday, April 26, 2003

It may appear to be a defeatist attitude, actually though its largely realistic.  HTML is the least of the issues in wastages of bandwidth. 

Mindless servers that serve the same content without caching
Images that are inefficient, or are on different servers, domains
Rollover code which is just vile.  (ok its not bandwidth but I just hate them)
Flash introductions for the sake of it.
Custom fonts
Broken CSS
Popup Windows
WebBugs
Queries performed inline instead of stored procs
Validation performed at posting of form instead of using business rules


Yes, its true, on the whole there is no point stuffing in indentation to whatever level on HTML content which is generated from some back end.  But to argue that not putting line feeds in is somehow going to make the piece of crap which is the interweb work any better is to tilt at the wrong windmill. 

At the expense of keeping it human readable (which was a major point of HTML in the first place) you'll save, maybe, 5% of the actual difference in bandwidth.  Its not  like all the whitespace is shuffled to the end of the stream and can just be chopped off at whatever modulus of buffer size.  Its in the stream.  If you end up with only a partial buffer, perhaps just one character, then you'll have maybe 1023 wasted, and so on.

Simon Lucy
Saturday, April 26, 2003

Taking away the excess white space (not the line breaks) using HTML compressor lite AND NOTHING ELSE, actually saves about 18% on the forum home page.

And the source code is, if anythiing, more readable, except for the style sheets, where linking has been suggested anyway.

Savings on the forum pages are smaller, from 36KB to 34KB on this thread.

So simply stripping away unnecessary white space and linking to the style sheet instead of embedding it would save the user who loads the home page and reads five threads of 20KB each 22KB out of a total download of 150KB That's a saving of  15% of the total bandwidth  of the forum.

In fact it is the non-image intensive sites, such as Yahoo or JOS, which would gain most from this.

Stephen Jones
Saturday, April 26, 2003

*  Recent Topics

*  Fog Creek Home