Fog Creek Software
g
Discussion Board




Yaml: Yanl Aint Markup Language.

Has anybody else tried YAML yet.

It's a lightweight alternative to XML that really makes a difference when editing those quick config files.

Rather than having to use a complex dom, or perplexing xslt, it loads straight into Perl, Python or Ruby's existing Array and Hash structures.

http://www.yaml.org

Any thoughts on this one?

Ged Byrne
Saturday, October 4, 2003

I think it's great, and was surprised when I actually saw a .yml file in use (it was in the Apache::Gallery project).

That said,XML's ubiquitousness makes it the tool of choice for my professional work, in much the same way that Java and C# are preferred over Python and Ruby. It's not the merit of the tool -- the world is full of better mousetraps -- but the fact that its squarely middle of the road: good enough and popular enough.

I also like STX over XSLT (see http://www.xml.com/pub/a/2003/02/26/stx.html ), but again it'll have to gain more traction before I use ir professionally.

Portabella
Saturday, October 4, 2003

I'm a YAML fan too and have been ever since XML got meme'd.

* It's far more readible than XML, which I think is one of the points of having a text based format. (That and portability.)

* It's far more writable than XML. XML should simply not be edited by hand since there's so much that can go wrong since the linear text stream representation does not match the hierarchical model of the data being encoded. In YAML, the format can't get all messed up since it is line based not stream based, allowing for a more native hierarchical presentation.

To summarize, XML sucks. YAML Rules.

Dennis Atkins
Saturday, October 4, 2003

---
-
    - PRIVMSG
    - newUri
    - '^http://.*'
-
    - PRIVMSG
    - deleteUri
    - ^delete.*
-
    - PRIVMSG
    - randomUri
    - ^random.*


?

Is there any provision in YAML for descriptors? The thing that's awesome about XML is that it's self-describing; in other words, you don't have to dig through documentation or the code to figure out what each tag is.
The other great thing about XML is the XML/XSL/XSD family, where with the proper tools (they're out there) it's trivially easy to manage, manipulate, and create data. With an XSD, you can either buy or write a parser to create a database structure, parser, and input form.

I suspect YAML is a solution in search of a problem...

Philo

Philo
Saturday, October 4, 2003

BTW, I keep reading "YAML" as "Yet Another Markup Language," which is my feeling about it. :-)

Philo

Philo
Saturday, October 4, 2003

I believe that the project start off as Yet Another ... buy they decided to go for Aint Mark Up.

I prefer it because lots of >s and <s are murder for the touch typist.  The little finger just isn't up to it :)

Ged Byrne
Saturday, October 4, 2003

Hey Philo,

Just check out the examples in the spec for the answer to your question:

http://www.yaml.org/spec/

As far as the problem it solves, the problem is to create a portable, human readible, human writable, standard for representing data in text files. For that YAML fits the bill.

Dennis Atkins
Saturday, October 4, 2003

So does XML, and it adds "established" and "accepted"

Philo

Philo
Saturday, October 4, 2003

Philo, do you find writing an XSL script simpler than writing an equivalent process in your programming language of choice? If you do, it reflects more on your language-of-choice than on XSL, in my opinion.

I find Python does that a lot better than the XSL monstrosity; But I'll make my side of the argument simpler by arguing for Scheme instead:

The official XSL specification is, last time I looked, literally ten times longer than the complete R5RS scheme specification, even though the latter is considerably more powerful. Scheme implementations are more ubiquotous than XSL tools, yet transformation scripts are of comparable length.

Furthermore, there is a striking equivalence between Lisp S-experssions and well formed XML documents - to the extent that natural one-to-one mappings leave them almost unchanged. IMHO, the Lisp version is easier to edit with a text editor, the raison-de-etre for XML being a textual format. Oh, and S-expressions are nearly 50 years old.

The only advantage I see for XML compared to competing alternatives (EA-IFF, S-expr, YAML) is that it is an accepted syntax standard (for all the wrong reasons, but still -- a standard).

Ori Berger
Saturday, October 4, 2003

Ah gee Philo, YAML's been around about as long as XML so it's just as established. And as far as accepted, well lot's of people use it. I guess you could make the same sort of argument for Fortran or C++ over Perl.

Is there something it particular you don't like about YAML or you just feel people shouldn't use it because it's not obtuse and error prone like XML, necessitating expensive consulting fees to get things straightened out ever time some hapless employee forgets to match an end tag and messes up an obese XML file.

Dennis Atkins
Saturday, October 4, 2003

> I suspect YAML is a solution in search of a problem...

Nope. It's a new species trying to compete against a rival that's already occupying a large part of the ecological niche.

I think the point about schemas is also wrong-headed. Much of the markup in the world just doesn't need schemas: config files, log files, test data files, miscellaneous application data: none of this *needs* to be XML-ed, it's just that XML is the only popular choice for structured data.

I'd rather have it in XML than endless ad-hoc data formats, but I think it would even better to have it in a genuinely human-readable and human-editable format -- like YAML.

Portabella
Saturday, October 4, 2003

Whoa, look at the defensive FUD fly!

"YAML's been around about as long as XML so it's just as established."
?
YAML's first implementation was 5/01. The XML draft working doc was published in 11/96. That's the entire cycle of the dotcom era, with the added impetus of XML growing into a vacuum.

Is there something I don't like about YAML? Yep - I hate with a passion people trying to fix something that isn't broken because they have a personal issue with it. From what I can see, your complaint with XML is "it's got angle brackets and I don't like angle brackets"

"necessitating expensive consulting fees to get things straightened out ever time some hapless employee forgets to match an end tag and messes up an obese XML file"

As opposed to putting a Tab in a YAML file:
http://www.yaml.org/faq.html

"none of this *needs* to be XML-ed."

Agreed. But then neither does it need to be YAML'd.

"I think it would even better to have it in a genuinely human-readable and human-editable format -- like YAML."

Oh please. You honestly think it's easier to get your average keypunch operator to understand and deal with this:
http://www.yaml.org/start.html
?

I've been working with XML for almost two years now. Yeah, schemas and XSL were strange to deal with at first, but they're not THAT hard to get used to. The best part about XML is that people find it so very easy to work with. I've got *truck drivers* working with it. No, in general they don't write it, but they find it very easy to read.

And as for teaching non-geeks to write markup, I personally think matching tags vs. spaces and dashes to be six of one, half a dozen of the other.

My overall point again - there is a vast infrastructure of support and development tools for XML. I think trying to create a new markup is muddying the waters, and YAML doesn't give anything that XML doesn't have (apart from a chance for some people to champion the underdog against the evil oppressor).

My $.02. YMMV.

Philo

Philo
Saturday, October 4, 2003

I hate it when people make up stupid acronyms like this.  I can't quite put my finger on why I don't like it, but it just annoys me.  Like GNU (GNU's Not Unix). 

And then there's this beauty of an answer for a (frequently asked??) question:

Tabs have been outlawed since they are treated differently by different editors and tools. And since indentation is so critical to proper interpretation of YAML, this issue is just too tricky to even attempt. Indeed Guido van Rossum of Python has acknowledged that allowing TABs in Python source is a headache for many people and that were he to design Python again, he would forbid them.

Ooookay, so you had to do some extra coding to work around Tabs.  Naah, let's just not do that work and just make people forget about that nasty Tab key.

While we're at it, I hate it when I have to parse line feeds and carriage returns, so let's just make people write the stuff in one big line so I don't have to program that part!

I don't think I'll be using YAML any time soon.

Wayne
Sunday, October 5, 2003

"While we're at it, I hate it when I have to parse line feeds and carriage returns, so let's just make people write the stuff in one big line so I don't have to program that part!"

That would be an attempt to work around Goldfarb's First Law. "... if a text processing system has bugs, at least one of them will have to do with the handling of input line endings."

(The demise of SGML may prove instructive; SGML actually has very nice facilities for writing marked-up documents with a minimum of explicit markup, but the intellectual barrier of implementing these facilities in a parser and designing DTDs to make use of them ultimately proved too much of a challenge for adoptors. I think this has a lot to do with the slavish adherence to "KISS" that shows up a lot in today's markup communities.)

Chris Hoess
Sunday, October 5, 2003

> Yep - I hate with a passion people trying to fix something that isn't broken because they have a personal issue with it.

A personal issue like trying to address what they see as flaws in it? I have no problem with personal issues like that.

And whether something is broken or not is a matter of opinion. I said in my first post that XML is good enough -- but not everyone wants to settle for good enough.

> As opposed to putting a Tab in a YAML file:

A mistake that's similar to these XML ones: forgetting to put in an end tag, or accidentally using two start tags instead of a start tag and an end tag or....

(You may replay that you won't make these mistakes with an XML editor. But I won't make the TAB mistake with YAML either, because I'll use the convert-tabs-to-spaces option in *my* editor. And even though we, smart guys that we are, won't make these mistakes, many other people will).

Many people objected to Python solely because of the way it uses whitespace, but I think it's now clear that it's a very powerful environment. The same is true of YAML.

> But then neither does it need to be YAML'd.

It needs to be manipulated programatically in a simple way, and YAML is a good choice for that. You say and I've have said that XML is good enough for these things, and so it is. But gack, the amazing amount of *bad XML manipulation code* that I've seen over the last few years really makes me want something more idiot proof. (and, yes, XML was a huge improvement over the junk I saw *before* XML).

> You honestly think it's easier to get your average keypunch operator to understand

This is a red herring, they won't understand either one.

I wouldn't claim to be an expert on what truck drivers can read, but I'd say that it's fairly likely that they can master either syntax. What you need to do is *design* the document (in either syntax) so that it's understandable to them.

And also, obviously, you'd comment either the document for your audience.

> apart from a chance for some people to champion the underdog against the evil oppressor

YAML appeals to some people's aesthetics for simplicity and power. I'm not *oppressed* by XML, for goodness sake: it's just a pain in the butt. If your aesthetics are, "they're not THAT hard to get used to", then, yeah, you probably won't understand this.

And I'm in the process of ordering Eliot Rusty Harold's Effective XML now, so I'm well aware that I'll be working with XML for several years to come.

Portabella
Sunday, October 5, 2003

The problem with yaml is that indentation is a pain in the butt to edit and get right. The hierarchical language used by INN, BIND, and so on... is a lot cleaner and at least as easy to parse. The base object is similar: a keyword followed by keywords and strings. In addition, a keyword can be followed by a block:

> zone "mydomain.com" {
>        type "slave";
>        file "db/mydomain.com.zone";
>        masters {
>                10.0.0.1;
>                10.1.0.2;
>        }
> }

[I hope that gets quoted right]

In INN it's less stringent: you don't need the semicolons.

Here's a general parser for the language:

http://sourceforge.net/projects/libtc

Peter da Silva
Sunday, October 5, 2003

Peter, I dig what you're saying, but I have to add here that YAML allows exactly the syntax you describe. It's called "in-flow style".

Here's the example from the spec:

Mark McGwire: {hr: 65, avg: 0.278}
Sammy Sosa:  {hr: 63,
              avg: 0.288}

Dennis Atkins
Sunday, October 5, 2003

The problem is that "allows in-flow style" means that the parser has to be able to handle both styles, and the reader has to be prepared to read both styles. This is the same problem that you get every time you have a situation where someone comes up with a syntax that has some kind of problem, and then when people complain sets up an alternative syntax that avoids that problem.

Now you get 95% of the code using the original syntax, 5% using the new syntax, and 20% of the parsers only bothering with one because they're written for one project and the "know" what they're going to see.

That's what ended up being the problem with SGML, and why we ended up throwing out the baby with the bathwater when formalizing it to XML.

Peter da Silva
Sunday, October 5, 2003

By the way, since this is primarily a Microsoft developer's forum, even though I'm a UNIX guy I don't see anything wrong with this:

[homes]
  comment = Home Directories
  browseable = no
  writable = yes

Unless you absolutely need nesting, a 2-level syntax for config files is plenty... and one that people are familiar with is even better.

For more complex config files, I tend to just plug an interpreter into the language and use that. Tcl, usually, because it's designed to be easy to plug in to C, and I can delegate the more complex parts of the parsing to a higher level language that's designed for writing config file metalanguages:

proc lease {name block} {
    set current_lease_name $name
    eval $block
}

proc start {time} {
    upvar 1 current_lease_name name
    set t [clock scan $time]
    lease_set $name start $t
}

where lease_set is the internal C routine. Now all I have to provide in C is a couple of routines like lease_set that take pre-parsed strings, and a character array containing this minimal parser. Normal configuration is easy, and yu can also script more complex configuration in TCL.

Lisp is another good language for this, but people tend to find this:

lease "1060 w. Addison" {
    start "3 oct 2003"
    end "2 oct 2004"
}

easier than this:

(lease '"1060 W. Addison"
  '((start "3 oct 2003")
    (end "2 oct 2004")))

for some reason. :) There are other reflective languages that can be used, but it's hard to find portable embeddable implementations.

Peter da Silva
Sunday, October 5, 2003

"> As opposed to putting a Tab in a YAML file:

A mistake that's similar to these XML ones: forgetting to put in an end tag, or accidentally using two start tags instead of a start tag and an end tag or...."

[shrug] Six of one, half a dozen of the other.
As I keep trying to say - I don't see YAML solving any problems without introducing similar ones. So it serves no purpose. Add in that XML has XSD and XSL (which are optional but available) and the panoply of XML tools available, and I simply see no value whatsoever in YAML.

Philo

Philo
Sunday, October 5, 2003

There exist lisps that use whitespace instead of parentheses.  Chaitin's lisp, for example.  Many people even write in those whitespace lisps with pen & paper until they're done designing.  Basically they just use parens (or braces, newlines, whatever) when necessary or comfortable.

If you're already comfortable with lisp, you might like this java applet simulating Chatin's lisp.  http://www.cs.umaine.edu/~chaitin/unknowable/lisp.html

I only have a passing familiarity with these things, so I'm not doing more than just mentioning it.  I don't care too much about the xml debate because you can just come up with different views of data if you don't like xml.

Tayssir John Gabbour
Sunday, October 5, 2003

Another project that scratches the YAML itch is PXSL (Parsimonious XML Shorthand Language) [1] What's interesting about it is that you *ARE* writing XML, just in shorthand, and it eventually gets transformed to the XML we all know and love.

[1] http://community.moertel.com/pxsl/

Chris Winters
Monday, October 6, 2003

PXL looks interesting, but the little pinkes still suffer.

I think the lightweight benefits of YAML do stand out when you are using one of the scripting languages like Perl, Python or Ruby.  These langauges have incredibly powerful functions for dealing with arrays and hashes, and YAML dovetails nicely with these.

YAML is already included with the latest Ruby distribution, and I suspect that , with its tidy fit, it will find it's way into the others.

Ged Byrne
Monday, October 6, 2003

>>As I keep trying to say - I don't see YAML solving any >>problems without introducing similar ones.
That makes sense, why use it if it doesn't solve anything for *you*?

>>So it serves no purpose.
Non-sequitir. There is something I call "the law of personal preference" which states that the largest barrier to code-reuse is personal preference. You are happy with XML, great. Lots of people are not. Therefore, XML is indeed 'broken' in the sense that there will always be attempts to align it more closely with what (those other) people want.

Bottom line. YAML, XML, LISP et al. are *all* broken in one respect. They all require additional 'non-human-centric' text "blurbs" (either indentation, tags or extra parenthesis) in order to make it easier for a computer program to process and (unambiguously) parse.

Until the day when computers can easily and unambiguously interpret consistent meanings from raw text without the extra "blurbs" (i.e., never) then these debates, language wars and 'innovations' will continue. Why? Because all those blurbs are really computer processing instructions that a human being considers 'noise' (unless that particular human being is happy with and enjoys looking at the blurbs as much as the message[personal preference]).

Some will tout their favorite, others will lament the persistant annoyance of people trying to fix "what isn't broken."

The law of personal preference: If someone out there doesn't like it ... it's broken.

scooty mcdooster
Sunday, February 8, 2004

The important thing about yaml is really the libraries.

The libraries translate yaml files into native arrays and hashes, in scripting languages where much can be done passing data around in arrays and hashes.

The yaml developers have built bridges between scripting languages with the primitive types like arrays and hashes.  If they do the same with objects, yaml because a very useful standard for those working in scripting languages.

The standard on the wire is nice in an average sort of way.  You have to consider the programming enviroment around the standard.  Unlike XML libraries, the YAML libraries strive to integrate seemlessly with the programming enviroment, with one liner parse and dumps statements marshalling the yaml in and out of existence.

That said, there are some kick-ass xml libraries.  And the benefits of yaml could have been implemented with a subset of XML.

Anyways.  I've used yaml, it can be a good experience.  yawn.

Patrick May
Saturday, June 19, 2004

Scooty:

You're trying to downplay YAML's advantage by saying that all formats need structural constructs.  Yes, YAML relies on indentation, but indentation is quite human-centric.  If it weren't, why do people indent their C/Java code?

YAML is a little trickier to parse because it performs layout-based parsing, but it's a lot easier for a human to read.  Anyone who says that XML is just as easy to read is lying.

YAML solves the readability problem XML has.  Now all that's left to replace is the pathetic type system.  The YAML people are working on that right now but, unfortunately, some of them seem to be looking towards XSD for insight.

(XML is not all bad.  It's good for marking up documents and other mixed-content formats, like HTML.  It shouldn't be used for anything else, though).

Wayne:

Layout-sensitive parsing isn't easy.  Rest assured that the YAML implementors are not stupid.  If they wanted to allow TABs, they could have.  The problem is that people use different tab widths.  You may have encountered a situation where the indentation of a C file is all messed up because the author uses different sized TABs.  Since YAML uses indentation in its syntax, this would cause lots of problems.

Do you think the designers of Java left out C-style #defines because they didn't want to do the "extra coding work"?

And what the heck is that comment about ignoring newlines?  Layout-sensitive parsers actually care about newlines and have to work harder to deal with them.  C/Java parsers can totally ignore them (that's why you need the pain-in-the-ass semicolon).  I think XML parsers can, for the most part, ignore them as well.

Peter da Silva:

The flow style is for convenience.  I don't understand how you can see it as a disadvantage.  A lot of the time, it's more convenient to put a bunch of short entries on a single line. 

For example, when you're making a function call in C, if the parameter list becomes too large, you usually split it up onto multiple lines, right?  Or do you keep it stretched out to column 140 to avoid confusing the reader.  Also, XML has the shorthand "<goose/>" format that really represents "<goose></goose>".

The concern that a parser will only handle one of the formats is just plain silly.  What makes you think that a YAML parser writer will decide not to conform to the spec?  Sure, if you're rolling your own one-off parser you may ignore some of the corner cases, but why would you want to do that?  USE AN EXISTING PARSER.  Java isn't trivially easy to parse, but who cares?  When I decide to program in Java, I don't write a parser.  I just write the code and use an existing parser (the Sun compiler).

The whole goal of making XML easy to parse was misguided.  The number of times an XML parser needs to be written is miniscule when compared to the number of times an XML document needs to be read/written.  Why place the burden on the common case?

And while it may be extra work for the parser, you're grossly understimating the human brain if you think those two forms cause confusion for people.

Philo:

XML is hard for humans to read and write.  YAML is easier for humans to read/write.  There's a solved problem, right there.  The "problem" it introduces is trivially solvable.  If the YAML parser says it sees tabs in the file, do a global search/replace and eliminate them.

If you want to bring up the "XML is mature", "XML has more tools", "XML is good enough" stuff, go ahead.  Those can be valid points, depending on your perspective.  But YAML's layout-based syntax does solve a problem.

Kannan Goundan
Saturday, August 7, 2004

XML defined so it'd be easy to parse? I guess you have never written an XML-parser... I have. Parsing of XML is a major PITA; even if not quite as bad as SGML. Worse, XML is "fractally complex"; that is, writing a parser for significant subset is easy, but closer you want to get to specs conformance (entity expansion and related checks, attribute default values, lf and white space normalization, DTD validation) harder it gets. Much (most) of this has to do with SGML heritage... it's like C++'s syntax problems that derive from its C compatibility.

Sad fact is that the only good things about xml are that it doesn't suck quite as bad as SGML, is almost human-readable and is a Standard (tm). Some people seem to be members of the Church of Standardization, and think that alone is worth all the pain, I guess.

So if XML is not all that convenient for humans read, it's certainly not a trade-off that has made it easy to parse.

I do agree with many of the comments -- especially the one about XML being ok for mixed content (which is hardly a coincidence -- despite its roots with SGML, it was HTML that really created the push for XML).

Cowtown Coder
Wednesday, August 25, 2004

*  Recent Topics

*  Fog Creek Home