Fog Creek Software
Discussion Board




What am I missing..?

What is the big deal with XML?

XML seems to be the answer to any programming problem lately.

What is XML to begin with? aint XML just a language that defines the syntactic rules for writing structured documents that respect some defined document type?

Where does the power of XML come from? The answer for me is simple : standarization. Without standarization XML would be just another language for representing data. With everyone using XML for describing and exchanging data you have several benefits : First, you dont need to learn a new syntaxis each time you have to deal with new documents, and second, you don't need to write a lexer each time you want to extract the data from someone's else formats, since several lexers are already written for XML. Or you can write your own lexer, the important thing is that you only have to write it once, and it will work on any document.

I think that's it.

Another "benefit" of XML is that it is legible for humans, so you can read it without problem, and edit it with any regular editor. That is indeed nice, but it renders XML unaceptable when you are exchanging large ammounts of data and good performance is a must. The problem is that general agreement nowdays is that you shouldn't care, after all "XML is the language of the internet" and that is all that matters.

Any decent programmer can read the documentation for the several APIs that exist nowdays for manipulation XML documents, named DOM, SAX, or whatever you want. So why do we have a myriad of books like "XML and C++ super-bible" "Master the internet with java and XML" etc ? For me, the explanation is simple : marketing.

Marketing makes people believe they need something, and spend money on it. Marketing can make you believe XML is good for everything and will fix all your programming problems, but it wont. The naked truth is that XML wont fix any technical problem you had without xml.

So marketing has made everybody, and specially the management and the hordes of mediocre programmers, believe that XML must be used everywhere, because that is the right thing to do. Well, at least it is the cool thing, no doubt. But using XML for things like storing huge ammounts of data when you need performance is just a braindamage.

We now have remote procedure calling using XML : there is SOAP and XML-RPC and maybe others I don't know. The problem is that we already had a nice standard protocol with Corba, more efficient by the way, since it wasnt text based.

The argument for using XML rpc is that is simpler than Corba. So they show you an example where they simple open an http connection, pack a call on XML and unpack the answer. Then they compare it with Corba, and convince you that all those mysterious BOA adapters are there to make your life unhappy, and that generating stubs and skeletons is unnecesary, etc. In practice, if you look at any platform for doing distributed programming based on XML, you'll see a lot of those features being incorporated, because they happen to be needed to make the RPC fit into the object model. Worse yet, they are not following any standard to incorporate them! didn't Corba have that?

Excuse me, I need to jump out the window now.

the lonely programmer
Friday, August 23, 2002

I forgot to say english is not my native language, so please excuse me for my pathetic grammar.

the lonely programmer
Friday, August 23, 2002

At my old job we refered to this as the "big o box of xml". It's like someone bought a big box of xml at a wharehouse discount store like Costco and now they have to find uses for it. Some of our developers got so caught up in using xml that they didn't stop to ask themselves what problem they were solving with it.

But I do think xml has good uses and my current project uses xml to store configuration information and also to occasionally save information to the local machine when they have it running off the network on a laptop, this is later synched up with a database. It was easier to parse a xml file using a parse then come up with our own text or binary format for saving local information. But to be honest, in hindsight I wish I'd have chosen a small local database like access or MSDE, xml does not handle some character sets or binary data as well as a database would.

I found it's also good for interfacing between systems where other RPC solutions such as CORBA or DCOM just aren't appropriate. I have yet to use SOAP in a production environment so that jury is still out.

Ian Stallings
Friday, August 23, 2002

>>>That is indeed nice, but it renders XML unaceptable when you are exchanging large ammounts of data and good performance is a must. <<<

I have mostly used EDI at my current job, and my knowledge of XML is very limited. I am trying to understand how XML is unacceptable for large data and critical performance, say as compared to EDI. Is it because it uses discriptive tags to define data? Or, is it something else?

Thanks for sharing the knowledge.

edi programmer
Friday, August 23, 2002

XML solves "What format can I store this little bit of data in that will be easy to parse later?" problems. Config files, over-the-web RPC, some data interchange. Though people talking about using XML for long term storage totally mystifies me.

It definitely has become a Golden Hammer. How many of you have had this conversation:

Your project manager: "Where can we fit XML in this project?"
You: "WTF are you talking about?"

I know I have.

Lonely programmer, your grammar is better than most native speakers, at least in the US.

Luke

Luke Duff
Friday, August 23, 2002

Amen.  Reading this post made my day.  XML has been the most overhyped technology I have ever come across.  There is nothing you can do with XML that you coundn't do with ASCII text.  It has not solved one problem that couldn't be solved before.  The IT industry has become little more than than marketing, vaporware, and hype. 

Bella
Friday, August 23, 2002

You're wrong bella, its _rehashed_ "marketing, vaporware, and hype".

semi
Friday, August 23, 2002

extra plus of xml - you can send it over http.

The human readable thing is a negative imo. I've done programming that uses xml (xul, svg, and now winamp) and because it is 'human readable' your written code and the compiled code are one and the same. The problem is it is a very shitty syntax to program in - if someone invented a language with a syntax like that (and there was no xml) they would be kicked in the arse.  So instead of having human happy code and computer happy binary, both end up having a bad day.

You are spot on with the lack of definitions - a lot of people say "we are using xml, so we already have a standard". Ha ha ha ha ha ha ha ha ha ha. I had to do a simple tree component using xml last year, so I looked around for the standard names to use (should I call the tag 'folder' or 'node' or 'container'...) - there may be one somewhere, but I didn't find it. The idea was 'use whatever you want, and people can map to it'. Yeesh.

Xml has its place, interop on the net is one, though I think for some of that a standard binary format would be better. There is now DIME, binary in xml. That makes more sense for a lot of tasks... Standard protocols are great news, just a lot of people are thinking because they use a standard transport, they don't need standard data structures. Hmm, well tcp-ip is a standard transport mechanism too when you think about it, so who needs html.

Robin Debreuil
Friday, August 23, 2002

Of course there are overhype in XML, but it's still a step forward.

The big news in XML is, IMHO, that is a very simple standard to describe hierarchical information which is *agreed and shared* by the whole universe ;-). In particular, you get with it (and you couldn't get all at once before):

- standard libraries for all languages to parse/generate/transform these data (so you don't have to reinvent your parser/generator/transformer each time). Even the api are standardized (DOM...)

- standard and tools to check syntax of this data for correctness (DTD, schema, validator...), so you don't have to write it yourself...

That's why I suppose you see lots of noise everywhere.

All these factors makes XML a perfect medium to exchange information, but to do more is indeed overhype.

Robert Chevallier
Friday, August 23, 2002

Amen. Reading this post made my day. ASCII has been the most overhyped technology I have ever come across. There is nothing you can do with ASCII that you coundn't do with bits. It has not solved one problem that couldn't be solved before. The IT industry has become little more than than marketing, vaporware, and hype.

Just me (Sir to you)
Friday, August 23, 2002

Amen. Reading this post made my day. Binary has been the most overhyped technology I have ever come across. There is nothing you can do with Binary that you coundn't do with my 12 Guage Mossberg 500 Mariner. It has not solved one problem that couldn't be solved before. The IT industry has become little more than than marketing, vaporware, and hype.

trollbooth
Friday, August 23, 2002

< 0100. 0101101 0101 0110 1010 10 101. 1010011 101 1001 110 1101 1010110001 101011101 1 0101 00010 1010 111011.  />

Robin Debreuil
Friday, August 23, 2002

Aside from RPC type stuff, the big promise of XML is sharing of data between applications and companies.

The problem is that the actual wire format is like 1% of the total problem here.  Companies and apps that really want to share data haven't been sitting around for years saying, "gee, our engineers are too dense to figure out comma separated ASCII, but now that this XML thing is here, we're set."

Standard XML schemas may help, but they really only validate that you're getting fields of the type and quanitity expected.  These kinds of things work if you're VISA and you can say "you want to process a charge, send it in this format or we don't do business".  Records where you have 10 agreed upon fields may work, but industry specific stuff will have a hard time.

I think XML is hyped as a panacea for problems that are really communication, business, and data mapping issues.

I can "get with" web services, basically CORBA on a diet.  Hype?  Yes, but it could solve some real problems in vertical situations. 

XML databases, and the idea that every system is going to integrate with every other system running on any device just because of XML.  Gag me.

Bill Carlson
Friday, August 23, 2002

> extra plus of xml - you can send it over http.

You can send ANY ASCII text over HTTP.  You can send Word docs, PDF files, flatfiles, XML files.  What's your point?

Bella
Friday, August 23, 2002

The point is, bella, that EVERYONE ELSE uses XML.  Its own un-earned success is what makes it useful.  When I see config files that are xml, it makes my life that much easier, or if another company takes XML web requests, things are that much simpler.  Your right, its nothign we couldna't have done with plain, delimited files, but most people are familiar with xml, and its become a standard.  That means microsoft and sun  are in agreement to use it. 

Vincent Marquez
Friday, August 23, 2002

the point is, bella donna, that you can't easily send binary files over http as plain text. Not every post is a response to your nugget of your wisdom here : ).

Robin Debreuil
Friday, August 23, 2002

Robin Debreuil, in general I agree with what you've said about XML, but we send binary over HTTP all the time, no harder or easier than plain text. The main app my company produces sends binary over HTTP, and that's how we get it to the client apps.

Troy King
Friday, August 23, 2002

But you use mime types or encoding, right? Mime types (as you know) may be blocked by a firewall, and encoding can get pretty verbose. Or are you doing something else?

Afaik dime sends binary over http within packets, but my feeling with that is eventually firewalls will be able to sniff it out and block it too, no? I would imagine that eventually any binary format has a chance of being stopped at a firewall, being that there is always more risk that comes with it. But I am no security guru...

How are you sending the binary data?

Robin Debreuil
Friday, August 23, 2002

as for ascii, I think if you follow the idea that ascii can represent anything (which it can, and it is a good idea) and then take the next step, and create a standard ascii format to represent 'anything', it might go like this

- asci can't display mandarin etc naturally, so some unicode support.

- how shall we indicate structures? Hmm, either invent a new way, or base it on some existing standard, hmm sgml looks interesting.

- how shall we indicate what type of document this is...

- how shall we verify that our doc has no errors as per our definiton...

- etc.

...and presto, you have something that resembles xml. Xml is pretty much what resulted from that very idea. The hype is excessive, but maybe it isn't possible to get everyone on the same page without the hype, so in a way it has a purpose to serve too.

The new task is to define standard data structures, now that we've agreed on a syntax. That is where soap etc are breaking ground. Eventually a lot of these will probably have a binary shadow format, derived in a standard way, which is fine too. Whatever else there is to say, real standards are a good thing, I think so anyway.

Robin Debreuil
Friday, August 23, 2002

Re: encoding binary files

The HTTP protocol is 8-bit clean. As long as you have the mime type set right, you can ship pure binary data without encoding it. How do you think all those gif files get sent?

Chris Tavares
Friday, August 23, 2002

right, but that involves setting the mime type on your server (which is something many people can't do) and then that mime type not being blocked by a firewall (which if it is a custom format it often will be), and not being blocked by the user (like activeX). If you want to be sure to get something through, and have something that can be universally used, I think some type of text format like xml is your best option on the net... Everything after that mime, script, non http, has a pretty high chance of failure going to a secure network, without tinkering. Thus the appeal of xml in business.

That's all I meant (I did say 'easily' - you can send binary data as plain text too, just it usually isn't so practical).

Robin Debreuil
Friday, August 23, 2002

HTTP, as earlier noted, is 8-bit clean. Which means you can pump binary data through it without problem.  No need for XML here.

The overall best way to support Unicode in transport is utf-8, which -- incidently -- works for any 8-bit clean transport whether it is unicode aware or not, but can still be considered ASCII for 99% of the uses. No need for XML here.

You know the old saying about C, that it "combines the speed of assembly language with the flexibility of assembly language"? Well, XML is better - it combines the efficiency of textual formats with the unreadability of binary formats (credit for this statement goes to Oren Tirosh).

XML is about as standard as ASCII. It has a little more semantics - it expresses hierarchy in a standard way, and it has a way to specify validity in an extremely limited, yet standard, way. But just like ASCII, it has absolutely nothing to say about semantics. A DTD or Schema specifies an XML language about as much as a Yacc/Lex grammar specifies ASCII text, only weaker. As Erik Naggum noted (can't recall where I saw this), walking skeletons without flesh would usually not pass as humans and would actually raise a lot of discomfort; So called XML specifications are usually no much more than a skeleton, but are accepted as standards for some strange reason.

XML has nothing innovative to offer. Lisp has used same concepts since ... 1958 (yes, *that* old), even with a similar syntax - <xml><innovation>Here!</innovation></xml> would be written (xml (innovation "Here!")) in Lisp, which is more efficient to parse, and about as readable/writable by humans. And at the same time (again, 1958), Lisp already had a standard scripting/transformation language and element reference specification (a-la XLink/XPointer/XSLT and friends). 1985 saw the introduction  of EA-IFF, a hierarchical binary format still in widespread use that is not as flexible as Lisp is, but is comparable to XML (sans standardized DTDs/schemas) but is significantly easier and more efficient than XML to both parse and generate. XML wasn't really needed as a hierarchical format either.

XML is a dumbed-down version of SGML, of which HTML is an application. HTML was hugely successful for logical page markup, and I believe it is this success that prompted XML's hyping as the "next silver bullet". Had Tim Berners Lee used Lisp expressions rather than SGML as the basis for the hypertext markup, XML would probably had never come to be and the Lisp comunity would probably have grown by an order of magnitude.

The IT market today is fueled almost entirely by hype and money making opportunities, and seldom by merit.  Get used to it, and don't try to dig explanations for why things are inherently useful when they clearly offer nothing.

As far as I can tell, XMLs only advantage is that everybody is using it - but that's not much of an advantage; As I noted above, despite the common belief, saying that someone is using XML doesn't convey much more information than saying that they are using ASCII.

Ori Berger
Friday, August 23, 2002

Have a look at [ http://xmlsucks.org/but_you_have_to_use_it_anyway/ ].  (Please disregard the domain name - the information contained within is well put). In there you'll find Erik Naggum's original quote (which I really messed up), along another gem of his - "XML is a giant step in no direction at all".

Ori Berger
Friday, August 23, 2002

XML is in itself a wonderful format for a self-defining way to present data.

What I hate, is not strictly XML... but the direction to make it more "complex" than it really is.  XSL and all of the deviants that attempt to support it.

Several years ago... XML was utter simplicity... a marked up version of a CSV type of file.  Even though it may still have that quality (in and of itself), the range of third party products that are thought "essential" to use it... well, it just makes me sick that something so elegant in concept could be corrupted so easily.

Joe AA
Friday, August 23, 2002

Does anyone have a good argument on why s-expressions don't suffice for XML's needs?  Paul Prescod's explanation is not convincing.
http://www.prescod.net/xml/sexprs.html

I did a google search on "xml s-expressions" and the pro-sexp side seems more convincing.

Greg Neumann
Friday, August 23, 2002

Robin D, you asked how we're doing it... we're just returning binary as the body of an HTTP packet. Web browsers don't hit our data site, so no mime type is necessary. We also write the client app, which knows how to interpret the results. We (now) have over 350,000 users behind every imaginable firewall and access type you can imagine, and none of them have problems receiving the data.

Troy King
Saturday, August 24, 2002

Well, I've looked around, and I hate to say it, but I'm wrong. I've seen the whole xml vs binary argument from svg vs swf, and there to me is a great example where a binary format makes more sense, though the differences aren't that great either way. In the back of my head I've always figured a 'ml format somehow is able to get parsed and rendered by the browser at a more natural level, where binary required more effort. But after being set straight here, I see that I've never thought that through - a browser needs a plugin or helper app for anything it doesn't recognize regarless of storage format, duh. I guess the argument has always went that svg will eventually be native in the browser where swf won't (which is fishy at  best), and the implication was that it was because a text format makes that easier. But gifs are native in the browser and svg needs a plugin, so what does that say...

I have read that the text format makes life easier, but really that makes no sense with http. Ok, well not sure how I got myself on the side of defening xml anyway, I'm no giant fan of it. There seems to be two threads here, one you don't need text based and the other that it is the wrong text format.  Are there advantages then to a text based format vs binary then?

Well I'll shut up and go bak to studing - thanks for all the info : )...

Robin'donna

Robin Debreuil
Saturday, August 24, 2002


For those that think the "simplicity" of XML-RPC is overrated, check out MIME-RPC:

http://www.mime-rpc.com

Zwarm Monkey
Saturday, August 24, 2002


I use XML because of the DOM. I use Microsoft's XML Core Services (formerly MSXML) and, let me tell you, I can have complex configuration files without having to write a parser. I can read that config files over the net. I can walk the tree (XML is a tree, ok?) without effort, I can create new versions of my data files, and the old and newer versions of my software can use them without screwing the format. I can get syndicated news from my desktop app without writing a single Winsock call, and that's nice, too. Oh, lets not omit XSLT and XPath.

Basically, the only thing I really like of XML is... MSXML, but what the heck, if I were a Java developer, I would have love it, too.

Now I'm thinking better, I love XML because it makes my life easier. I still don't get the idea of an XML database, thought.

Leonardo Herrera
Saturday, August 24, 2002

Maybe if you had to write the parser, you wouldn't think complex configuration files were a virtue.

Joe AA
Sunday, August 25, 2002

Paul Prescod's argument against S-expressions (http://www.prescod.net/xml/sexprs.html) is very weak. He basically makes to points:

1. XML tags are superior because someone handwriting a file using S-expressions might get confused by which right-paren ) is closing which left-paren. XML tags help you by forcing you to include the closing tag's name. Later on the page, he contradicts himself when he says he was a "strong proponent" that "it might be better if the tagnames in the end-tag could be omitted".

2. The central idea of the XML family of standards is to separate code from data. The cental idea of Lisp is that code and data are the same and should be represented the same. He says using different formats for XML data and "code" is good because you won't confuse the two and you don't want a language or format that offers more power than you currently need. For Lisp fanatics, I imagine this seems completely backwards..

maybe S-expression fanatic should create their own "SXML" standard and writing the converstion programs. :-)

Zwarm Monkey
Sunday, August 25, 2002

I have no problem with a nicely defined tag markup, but the XML media overhype was just preposterous. 

I am still waiting for XML to replace Windows, Java, HTML, Visual Basic, Unix, COBOL, HTTP and every other piece of technology ever made.

Bella
Sunday, August 25, 2002

Oh, BTW, since when does "everyone" use XML?  That's a laugh. 

Bella
Sunday, August 25, 2002

XML has its place, but it isnt a "silver-bullet". We use it too much in places and I am trying to get "them" to pull it back to just network boundaries. Sure hope the world learns its lesson fast. The XML Journal, now there is a mag built on hype !!

Regs,

James Ladd
Monday, August 26, 2002

I use it. Other people I deal with use it. Having XML, DTD, XSD, DOM SAX, XSLT, XPATH on the resume is valuable. Whether it's "good" or not?  It's good for me.

Not everyone is using it but, unless you're a stasist, what's more important is the growth rate.

Reminds me of Shirky's "half the world":
http://shirky.com/writings/half_the_world.html

Dan Sickles
Monday, August 26, 2002

I have to say I had a similar attitute to XML until yesterday.  I was about to write yet another parse to parse a text file for a new program I was writing and I thought, "hmmm, maybe if use XML I can get this parsed for me.  I think I'll go check that out".

So, I found this site

http://www.w3schools.com/xml/xml_whatis.asp

and after reading that and this one

http://www.w3schools.com/schema/default.asp

I was convinced XML is a good thing.

*) I no longer have to write a parser.

*) with XML Schemas I also no longer have to validate much of the input

*) As XML is a standard there are now product to visual design your XML formats

http://www.xmlspy.com/products.html

*) Both major browsers have XML format built in meaning you can have the browser load the file and access it through the DOM.  Even without much code you can get it to do some pretty cool stuff.

http://www.w3schools.com/xml/tryit.asp?filename=cd_navigate

*) Since this is a standard, in that same vain there are probably already or soon will be tools for making XML content easily editable.  Instead of having to custom code an interface for my data I could just use a custom form, no code, and one of those tools or libraries

One other comment, although binary formats have their place, when possible I prefer text editor readable formats.  This lets me easily check the contents of the file, generate the files and modify them (like search and replace "d:\my folder\.." with "../otherfolder/.." etc).  Things that would be harder if I was using a non text based format.

Gregg Tavares
Monday, August 26, 2002

*  Recent Topics

*  Fog Creek Home