Fog Creek Software
g
Discussion Board




XML: double cream or low fat?

Which of these XML designs do you prefer?

<drinking>
  <substance>beer</substance>
  <perPersonAmount>23 pints</perPersonAmount>
  <location>the bar</location>
  <peopleAllowed>
    <person>
      <title>Mr</title>
      <surname>Bonkmeister</title>
      <workTitle>Coder</workTitle>
    </person>
    <person>
      <title>Ms</title>
      <surname>Spanky</surname>
      <workTitle>Ubercoder</workTitle>
    </person>
    <person>
      <title>Mr</title>
      <surname>Soddington</surname>
      <workTitle>Butler</workTitle>
    </person>
  </peopleAllowed>
</drinking>

VS

<drinking substance="beer" perPersonAmount="23 pints" location="the bar">
  <peopleAllowed>
    <person title="Mr" surname="Bonkmeister" workTitle="Coder" />
    <person title="Ms" surname="Spanky" workTitle="Ubercoder" />
    <person title="Mr" surname="Soddington" workTitle="Butler" />
  </peopleAllowed>
</drinking>

and why?

Personally, I'd go for the 2nd design... mainly because I think you miss the point with the first design OR you've been writing way too much HTML in your life ;)

The first design, IMO stems from a long association with verbal document design and stuff like HTML. You can easily see that it is basically the same as writing:
"Hi people, <location>the bar</location> will be open for a <substance>beer</substance> drinking session. Each person will be allowed to drink <perPersonAmount>23 pints</perPersonAmount>. The following people are allowed: <title>Mr</title> <surname>Bonkmeister</surname> (<workTitle>Coder</workTitle>)..." bla bla etc.

Someone here just had someone else not accepting a document's design because he used attribute-centric design and not element-centric design. The (other) person desperately needs to see the element tags around text or his brain shuts down.

Now, I've been designing docs assuming that:
1.) attributes can be used for 1..0/1 relationships to an element
2.) child elements are used for 1..n relations.

These simple rules keep my documents nice and short, still readable and so forth.

What is your view?

Gerrie Swart
Monday, October 27, 2003

Is the <peopleAllowed> wrapper necessary?

Could you just have a list of people.

Since there are only allowed people listed, this tag doesn't seem to add any information.

Ged Byrne
Monday, October 27, 2003

No, peopleAllowed is exactly (part of actually)  the point I'm trying to bring across, i.e. why DO people want the redundancy?

Gerrie Swart
Monday, October 27, 2003

> Now, I've been designing docs assuming that:

What happens when your assumptions change?

Although the element-based version is quite a bit more verbose, you never have to change an attribute to an element if the cardinality changes.

In one design I did, I used attributes for database columns, and an issue that arose was attributes with large amounts of text, and/or embedded markup. These situations are both more elegantly handled with elements.

Portabella
Monday, October 27, 2003

Hhm, some clarification:
I'm basically going on about people's insistance on element-only based design.

I feel that you can use attributes for anything you can reasonably assume / prove that:
a.) it won't change in a way that breaks things (oooh, difficult)
b.) there cannot be more than one (identity no e.g.)
c.) it is not a mission to handle (no silly characters, or related)

(a && b && c) must be true BTW.
b.) seems to be the most limiting (I know people with more than one surname, can you believe it :)

So I'm not trying to sell attribute-only design, hehe :)
I see it as shades of grey, but *a lot* of people see it as a one colour (element) world.

Something else as well: performance. Does anyone know anything about the performance aspects of the types of design?

Gerrie Swart
Monday, October 27, 2003

peopleAllowed isn't useful from an xml POV, but
is useful from a programming POV. You can just
refer to the all the people using a single element
name in your program and pass that element
around. As you don't have a lot of elements
in your xml yet it may not matter, but as things
get more nested and complex, it is useful.

The attribute approach doesn't scale either.
As you get more attributes or an attribute
can be very long, it doesn't fit well.

And once any attribute gets structure, that
is, no longer scalar, you'll need an element
anyway.

I use attributes when they specify information i'll need
to deal with the element information. Usually a unique
key so i can do some validation and creation of an
object based on the id.

son of parnas
Monday, October 27, 2003

son of parnas,
thanks, good point.

Do you use the attributes only as a type of "primary key" thing or do you consider the usage of attributes for, say, something like an employee ID as well?

BTW, could you add some more info on why you feel that the attribute design does not scale well? I can understand the thing about attributes should not contain too much info (length) or nasty characters or some such. But I am not sure I understand your point about "too many" attributes. Surely 100 attributes or 100 elements can both be "too much"?

Gerrie Swart
Monday, October 27, 2003

My take is that most XML is eventually for a programmer
to do something with. So the XML creator and programmer
audiences need to be made happy.

I consider attributes related to the
ResourceAcquisitionIsInitialization (http://c2.com/cgi/wiki?ResourceAcquisitionIsInitialization)
idiom. That the attributes are something that i need to make
a valid object before doing more stuff to it using the
elements.

With the no attributes approach i have to iterate through
a list of elements and open up each element to pull
out the information i need.  This is overly complex.
I need to create objects before i am ready. Or i need
to store and validate a lot of data before i can create
my object.  From a programming POV it's nice to have
the information available when you need it so you
can use your objects  without a lot of glue logic.

Attributes don't scale simply because attributes are part of
the element enclosure and it's hard to fit a lot of stuff on
that line without making it hard to read, even when moving
each to their own line. It's not a technical argument but
is practical argument on the tradeoffs between authors,
readers, and programmers.

Plus, it's common for attributes to require further structure
which means you'll need to change everything to make in
an element. Constructor type arguments rarely change in
this way.

son of parnas
Monday, October 27, 2003

Attributes have limitations related to type. If those limitations don't bother you or interfere with your goals, then use whatever makes you comfortable.

(I presume you're actually making XSDs for these things...)

Brad Wilson (dotnetguy.techieswithcats.com)
Monday, October 27, 2003

When I design or consider a schema I try to identify the area of conern and the nouns I'm working with.

In your example my area of concern would be People at a party and my nouns would be Persons and drinks.

Using an attribute to handle the peripheral information is quite fine and makes lots of sense.  Its basically information about the information I care about. Thus location="the bar" is quite fine, if that isn't an area of concern, but listing title="Mr." name="Joe Schmoe" isn't in my process.  I'd rather see <person title="Mr.">Joe Schmoe</person>.  I'm primarily concerned with the person's name, I'm peripherally concerned about their title.

Its hard for people to grasp the concept of when its okay to put information into attributes, some people really don't like "Hiding" information there because they think it in inaccessible (it isn't to a decent parser).  But you don't want to make your entire tree irrelavent by hiding all your information in attributes.

Essentially, I should be able to strip all the attributes form your XML and still make some sense of it.  That's what I aim towards.

Lou
Monday, October 27, 2003

As exhibit C I would like to submit the same data in YAML [1] :

----
drinking:
    substance: beer
    location: the bar
    people allowed:
        - {title: Mr, Surname: Bonkmeister, workTitle: Coder}
        - {title: Ms, Surname: Spanky, workTitle: Ubercoder}
        -  title: Mr
          Surname: Soddington
          workTitle: Butler
----

A few points to note:
  - Much more readable
  - Easier to type
  - More choice.  Both the inline and extended style can be
    mixed to suit content
  - It can be as self describing as your like.  Since each 
    element is not forced to label itself, peopleAllowed is an
    efficient alternative to labeling each element as 'person.'
  - It scales much better than trying to skim down xml.
  - A quick script can be used to convert this YAML to
    either xml style [2]
  - A more verbose xsl could be used to convert either xml
    style to this YAML.

----

[1] http:/www.yaml.org
[2] Ruby 3.8 script:

require 'YAML'

doc = YAML::load <<END_YAML
drinking:
    substance: beer
    perPersonAmount: 23 pints
    location: the bar
    peopleAllowed:
        - {title: Mr, Surname: Bonkmeister, workTitle: Coder}
        - {title: Ms, Surname: Spanky, workTitle: Ubercoder}
        -  title: Mr
          Surname: Soddington
          workTitle: Butler
---

puts "<drinking>"
doc['drinking'].each do |k, v|
    if k == "peopleAllowed"
        puts "\t<#{k}>"
        v.each do |person|
            puts "\t\t<person>"
            person.each do |k, v|
                puts "\t\t\t<#{k}>#{v}</#{k}>"
            end
            puts "\t\t</person>"
        end
        puts "\t</#{k}>"
    else
        puts "\t<#{k}>#{v}</#{k}>"
    end
end
puts "</drinking>"

Ged Byrne
Monday, October 27, 2003

Ooops, Ruby 1.8.

Ged Byrne
Monday, October 27, 2003

If you are cool with indentation determining structure
then go for it. Personally i like explicit structure rather
than performing white space magic. Making this
message in a programming language would be
a pain because i would always have to be aware of
my indent level. Not good.

son of parnas
Monday, October 27, 2003

"- Easier to type"

Four spaces before each person. Six if you have elements under the person. I have schema that are nested seven deep (14 spaces before each element - don't lose count or YAML will break)

And don't forget - you can't use tabs.

I'm still convinced the YAML crowd is all about being counterculture. [grin]

Philo

Philo
Monday, October 27, 2003

And getting back to the original post - I try to use attributes for modifiers of the element, and child elements for children of the element.

In OOP:
attributes <-> properties
children<->subordinate classes

Is there a reason *not* to do it this way? (besides the client fighting it because he can't grok attributes)

Philo

Philo
Monday, October 27, 2003

>In OOP

Aren't elements usually attributes of an object?
They just aren't scalars.

So the distinction is somewhat artificial.

son of parnas
Monday, October 27, 2003

Philo,

I'm so used to having Scite handle all of my indenting for me that that problem never occurred to me.

You are, of course, quite right.

I've also noticed that it looks rather messy in a proportionally spaced font.

Ged Byrne
Monday, October 27, 2003

> And don't forget - you can't use tabs.

Egads! Just use a editor that converts tabs to spaces (IMO, a good idea anyway). Add auto-indenting and you're done.

The "exhibit C" YAML looks a lot shorter and clearer to me than either of the XML ones; again, counterculture's got nothing to do with it.

Portabella
Monday, October 27, 2003

Why does it matter if your XML is mildly more difficult to read than your YAML representation of the same data?  If you're concerned about human readability, that's okay, but often I'm creating a program to generate the data, and another to read it.  The length of the tags and the legibility of it isn't something I'm concerned with (usually).

Is this just a case of Perl is ugly, Python is prettier, we get rid of all the braces and use white-space.

It makes sense in a programming language (mostly, still confuses me sometimes), but in a data storage/transmission layer, I'd rather have a well structured document that's whitespace independent.  Just gives me the willies.

Lou
Monday, October 27, 2003

Ged and Portabella - you've missed your own point.

And I quote:
"- Easier to type"

Yet both of you refute my challenge with "but my tool does it for me just fine"

If you need tools, then it's not easier to type, is it?

Philo

Philo
Monday, October 27, 2003

> If you're concerned about human readability, that's okay, but often I'm creating a program to generate the data, and another to read it. 

In those situations, I agree with you; XML's ubiquitousness makes it the best choice.

However, there are many situations where readability *is* a concern (eg, config files, log files, etc), and in those cases I think that YAML solves the problem better.

There was just a long thread on YAML on JoS, where it was pointed out, ad nauseum and ad infinitum, that making XML fit is probably better from a business perspective. Can we all stipulate that we understand this, and save some bandwidth?

IMO, pointing to the (relatively few) technical warts and repeating endlessly that the people who are doing it just want to be different are far less convincing than simply looking at the documents side-by-side.

Portabella
Monday, October 27, 2003

> If you need tools, then it's not easier to type, is it?

You don't need the tools, obviously.

Honestly, how hard is it to just indent appropriately and not use the tab key? Not very. And if you *do* use the tools -- which are a lot more lightweight than XML editors -- then you are just about guaranteed to do it perfectly.

> Is this just a case of Perl is ugly, Python is prettier, we get rid of all the braces and use white-space.

Perl vs Python has much more to do with Perl's incredible potential for obscurity vs Python's economy and readability. The whitespace vs braces might be seen as a symptom of that, but it's far from the whole story, which is fairly well described here:

  http://www.linuxjournal.com/article.php?sid=3882

Portabella
Monday, October 27, 2003

I just wanted to point out that you guys are arguing about tabs, brackets, and file formats.  I hope you all married early, because I don't see it happening anytime in the future.

To make things more interesting, if God hired a bunch of H1-B visa holders to design an irreducibly complex biological system, would she have required them to use tabs or spaces to delimit the meta data describing the system?  If Darwin were alive to day, would he have used LaTeX or Microsoft word to format The Origin of the Species?

rz
Monday, October 27, 2003

"Honestly, how hard is it to just indent appropriately and not use the tab key? Not very."

-Root
  -Organization
  -Customer Organization
    -Branch
      -Department
  -Client Organization Two
    -Main Branch
      -Admin
        -CEO
        -HR
      -IT
        -QA Office
          -Department Head: John Smith


1) Find the two structural errors.
2) Add three more offices with three Department heads each.

Philo

Philo
Monday, October 27, 2003

Philo,

I am definately getting your point.  I had rejected Python because I get fed up with white space issues when cutting and pasting code around (for refactoring, not reuse).

I then settled on Ruby because it uses a properly delimeted block structures (either do ... end or {}).

Another consideration is that in a good scripting language, like Ruby, YAML doesn't really save you that much.

Here is the YAML4Ruby cookbook. 

http://yaml4r.sourceforge.net/cookbook/

The striking thing is that Ruby source already allows you to specify everything in a concise manner without the extra parsing or whitespace problems of YAML.

So this brings us to exhibit D:

require 'YAML'

drinking = {
    'substance' => 'beer',
    'perPersonAmount' => '23 pints',
    'location' => 'the bar',
    'peopleAllowed' => [
        {'title' => 'Mr', 'Surname'=> 'Bonkmeister', 'workTitle' => 'Coder'},
        {'title' => 'Ms', 'Surname' => 'Spanky', 'workTitle' => 'Ubercoder'},
        {'title' => 'Mr',
          'Surname' => 'Soddington',
          'workTitle' => 'Butler'
        }
    ]
}

puts "<drinking>"
drinking.each do |k, v|
    if k == "peopleAllowed"
        puts "\t<#{k}>"
        v.each do |person|
            puts "\t\t<person>"
            person.each do |k, v|
                puts "\t\t\t<#{k}>#{v}</#{k}>"
            end
            puts "\t\t</person>"
        end
        puts "\t</#{k}>"
    else
        puts "\t<#{k}>#{v}</#{k}>"
    end
end
puts "</drinking>"

Hmmm, givin a decent scripting language, YAMLs appeal lessons somewhat.

Thats why I like it here, makes you think.

Ged Byrne
Monday, October 27, 2003

There are those that say why use xml, just encode
the structure in java classes. I've done that for
perl. The problem is it's not usable from a different
environment.

son of parnas
Monday, October 27, 2003

Philo,

Why not add comments that describe your intention for the structure? Then I think the issues would be obvious.

Like obfuscated code contests, doing something badly doesn't prove much.

Also, using lists for everything deliberately ignores YAML mappings, which have real names. You know, the ones that all the examples on the site use....

> Hmmm, givin a decent scripting language, YAMLs appeal lessons somewhat.

I agree with son of parnas that the idea is to edit configs without touching code.

Portabella
Monday, October 27, 2003

Current thinking on this:

1. Either something along the lines of Philo's proposal.

or

2. Attributes when values come from a discrete set (e.g. title may be limited to Mr|Ms|Miss|Mrs|Dr|Sir) and elements when values come from a (in a practical sense) infinite set (e.g. surname)

Walter Rumsby
Monday, October 27, 2003

Lou, to me using the attributes as meta-data makes a lot of sense. Your point is also related to Philo's comment, i.e. attribute <-> properties in OOP.

The subtlety (I guess) however comes when trying to determine what can be used as meta-data and what might become more complex in future. This of course is the gist of Portabella's "what if your assumptions change?".

E.g. we assume a person has only one title:
This person is a Mr, but he might become a Dr. What to do if he becomes a Prof as well? This I guess is a business decision; we don't worry about his insistence on using both titles, we will just store the most recent as an attribute. If we do care about the titles, we might want to use an element however.
Note: I realise this example might not work in all countries, as title structures differ greatly from country to country. But I hope you get the idea.

Child elements are easy to choose, but choosing the wrong attributes might impact future development. I guess a feeling for this comes with experience in design. So it won't do to choose attributes for things that are flat *now* but may become hierarchical (or attributeInstances > 1) later.

From the other posts (thanks people :) I gather that most people would like to see:
- elements-based documents for designs that are mostly meant to be human understandable
- attributes for non-critical info / meta-data
- attributes for enumerated values
- elements for "big" values (aka lots of data :)
- elements for everything else ;)

Anyway, thanks for the replies everyone, it has provided some good info :)

PS:
YAML + Ruby. Thanks for taking the time to create some examples, it seems very cool. I'll go and play with it (I have downloaded the Ruby stuff, but have yet to do something in it). But for the purposes of an XML thread I won't enter the pro-YAML / pro-XML debate right now.

Gerrie Swart
Tuesday, October 28, 2003

My criteria would be :
Am I handling data or documents ?
The first design would be used for documents, the second design would be used to represent data.
Of course it's not always clear-cut, but I find the guideline useful.

GP
Tuesday, October 28, 2003

*  Recent Topics

*  Fog Creek Home