Fog Creek Software
Discussion Board




interesting little tidbit on HTTP

I've been doing some, ok a lot, of HTTP protocol work recently.  I am implementing HTTP for a server I've been working on, and doing typical web apps in Java.

Well the other day I wanted to referece the HTTP/1.1 spec on POST parameter body encoding.  It turns out the commonly used way of encoding parameters in the HTTP body is not the spec, and is in fact a lightly documented de facto standard. 

For instance:

a=1&b=2&c=3

This is so widely used that I suspect most web developers assume that it is standard HTTP.  In fact I even went to Java Servlet spec, and it is unclear in that document where the format is specified. 

This encoding is specifed for URLs, but not POST parameters.  In fact it is possible to have parameters in the URL and the POST body, and spec says nothing about how to interpret this. 

My guess is this has been around since the Mosaic days.  It has just continued because, that's how Mosaic did it.

Anyways thought you all might find that as interesting as I have.

christopher baus (www.baus.net)
Thursday, June 10, 2004

right, i wouldn't expect the HTTP spec to define HTML either. both application/x-www-form-urlencoded and application/html (or text/html) are data formats, not wire formats. HTTP can support them. or using application/xml for both POST request and response.

this tidbit is the only spec on form encoding I could find in 5 minutes:
http://www.w3.org/MarkUp/html-spec/html-spec_8.html#SEC8.2.1

this is a related RFC, it's what your browser uses if you have a file attachment in a form:
http://rfc.net/rfc1867.html

SOAP, XML-RPC, etc tend to use application/xml.

Other systems may use other data formats.

mb
Thursday, June 10, 2004

Thanks for that second link. For some strange reason I didn't think about looking in the HTML spec for HTTP encodings. But actually that makes perfect sense to me now. 

I was specifying a programmatic HTTP interface (that didn't have anything to do with HTML) and wanted to link to the spec, but couldn't find it anywhere. This spec, points back to the URI spec. Which, I guess, is the correct place to reference this. 

Now if only someone could explain the hell of HTTP header continuation lines ; )

christopher baus (www.baus.net)
Thursday, June 10, 2004

Why is "Referrer" misspelled as "Referer" on the spec?

TJ Haeser
Friday, June 11, 2004

http://dictionary.reference.com/search?q=referer

:-D

Janonymous
Friday, June 11, 2004

The mistake slipped past in early drafts, most programmers can't spell worth a damn and so we ended up with shipping products that use an incorrectly spelled word somewhere most users will never see it. And the rest, as they say, is backwards compatibility.

The userspace USB library in Linux insists on using the word busses (which is an unusual alternative plural but could also be interpreted to mean kisses on the cheek) and every time I write code for it I have to convert between my functions, stuctures etc which call them buses, and those of the library which say busses. This causes more typos per line than any other API I've used, including those with ridiculous "Hungarian" notation.

Nick Lamb
Friday, June 11, 2004

This happens a lot, Fits images ( the standard for astronomical images + data ) uses the keyword EPOCH where it means EQUINOX, leaving you with a small problem of what to call equinox.

Never mind all the graphics libraries that misspell colour ;-)

Martin Beckett
Friday, June 11, 2004

if you're creating your own http interface you might also want to look into xml-rpc. there are libraries which exist for it, etc. SOAP works too but is hard to use without a proxy library.

http://www.xmlrpc.com/

note that the due to the author, spec is probably fuzzy, liable to change, and missing features, but conceptually good.

mb
Friday, June 11, 2004

HTTP does most of what I want it to do.  I've used it for "web services", etc. for years.  If you ever try to implement a proxy for SOAP or any XML based protocol you will quickly see its limitations.  The biggest limitation of XML is that it requires stateful parsing to validate.  This is because of the end tags.  XML is very difficult (impossible?) to validate without reading the entire message into memory.  On the client and most trivial applications that might be fine, but if you want to write a network layer that validates 1000 or more connections concurrently it becomes a problem.

I am working on an HTTP proxy, and have been in my free time for about a year.  The reason I wrote it from scratch is that I wanted the following characteristics:
 

1) use only one thread

2) don't allocate memory while accepting and processing connections.

I am of the firm belief that memory errors are the single largest source of security problems in servers.  For that reason I size all my buffers and allocate them at startup.  Under low load that might be seem wasteful, by I follow the tenet: memory is cheap, failures are expensive. 

On the server there is no reason to play nice like the desktop.  If the memory is there, you should take it and use it.

christopher baus (www.baus.net)
Friday, June 11, 2004

*  Recent Topics

*  Fog Creek Home