Fog Creek Software
Discussion Board

City Desk And XML

I have read the little critique you gave of XML in your five part series on its development. However, as a user, I would like to point out that XML has one big advantage for a tool like City Desk, specifically source code control.

We put up a lot of stuff using it, however, we want to keep our intranet site under source code control. Since your cty files are basically Jet databases, they are just one huge binary file which doesn't do well with text tools like SourceSafe or CVS. (We use the later.)

It sure would be nice if there was a tool to export City Desk files to XML, and import them from XML, so that we could put them under CVS and get meaningful difference data.

Just a thought. BTW, we are a programming shop that puts specifications on an intranet with City Desk. We use it because it is easy, not because we can't do HTML. Same reason I use MS Word rather than LaTeX. I suppose I could do both, but Word sure is easier. (To say it another way, in your drive for ease of use for the non techies, don't leave us techies out in the cold, we like the software too.)

Jessica Boxer
Sunday, April 11, 2004

Good point!

I agree that CityDesk is painful for source control purposes. We need to do something about it.

I'm not entirely sure XML is the right answer ... if XML were our native format, publishing time would be unacceptably slow for all but the tiniest of sites, and if we merely export to XML for the purpose of checking into source code control systems, you don't get the full benefit of source code control. There may be some combination of XML + binary/relational caching that gets the performance and multiuser data integrity that we have today without sacrificing performance. We'll work on it.

For now, one thing you could do is simply check the directory that gets created when you publish into source control. That will get you "diff" ability at least.

Joel Spolsky
Fog Creek Software
Monday, April 12, 2004

I'm not totally sure why you say that publishing time would be unacceptably slow for a natively XML based system. I am presuming that you are thinking of storing the XML in plain text documents rather than a native XMLDB of course, but even so, on a modern filing system and reasonable machine, does opening/reading files take so long? I would think it would have to be an extremely large site before that overhead really had a problematic impact.

It is certainly slower than just pulling things from a DB, yes, but would an extra 10 seconds during a publish process cause much concern? Is speed that important?

I would have thought the benefits outweigh that argument...

Sorry, never quite got my head around people's notion that XML is inherently slow - sure I wouldn't base a 3d shooter on it for a low level representation, but for document storage/transformation, how much speed do you need?

Andrew Cherry
Monday, April 12, 2004

I think Joel isn't saying that XML is slow, but rather that doing database stuff out of text files is slow. This is true independant of what's in the text files - be it XML, Windows INI files or comma-delimited lines, if you have a non-trivial amount of data, and you want to access random sub-sets of that data, you have to either search through the whole thing, or index them in memory, or something.

(The original question had to do with checking things into source control, so a diffable text format is apparently being assumed).

For myself personally, I'd like to see source control systems that can take some sort of 'diff modules' for different types of files. That way, we can have code that knows HOW to properly diff a jet database file (presumably by comparing schema and data), and be able to compute useful diff outputs based on that information.

Does anyone know of an SC system that does this? CVS obviously doesn't, Subversion (who's web page I just checked) doesn't, but it does claim to deal reasonably well with binary diffs. Any SC system do type-specific diffs?

Michael Kohne
Monday, April 12, 2004

Ah I see, well, I can understand the point about querying this kind of data being potentially slow with flat file storage yeah... Although a good XML system implementing a fast XQuery implementation will hopefully solve this one soon...

Andrew Cherry
Monday, April 12, 2004

One way to solve this problem (wanting XML for source control, but native DB for publishing speed) is to provide XML import and export functions.

You could export your whole site as XML for checking into source control. If you want to restore your web site to last week's version, you just import all the XML files into a new CityDesk file. It would restore all your articles, files, settings (publish locations, etc).

If you just want to restore one article, or one template, or one location's publish settings, you could import a single XML file into a .cty file. This would be a neat way for people to share CityDesk tempates without needing to send people a whole .cty file - you could give them a template.xml and a variables.xml to import.

Binary files (JPGs, MP3s, PDFs, etc) could be exported to a single directory, with an XML file in there to specify where they are stored in the CityDesk site structure. That way, when you move them around in CityDesk you don't cause problems for your source control system.

Darren Collins
Monday, April 12, 2004

Michael: svn diff --diff-cmd my_super_dooper_diff_program ?

SVN weenie
Tuesday, April 13, 2004

SVN Weenie, I think the author was talking about having different diff systems built into the version control backend that would allow more efficient encoding of differences between non-text files.  For example, even though two ZIP files are completely different on a byte-by-byte level, the diff might just be a one line change in a file that's part of the archive.  In a database, you could just note the changed data, and let the VCS regenerate the indices.  Using a different diff application on the client side is solving a different problem.

Ben Combee
Tuesday, April 13, 2004

Ben: That's a different interpretation of what Michael said than I read.  I reread Michael's comment, and I still think --diff-cmd solves pretty much what he said he's looking for, but I see how you could interpret it either way.

Weenie, S.
Sunday, April 18, 2004

CVSNT (native win32 port/fork of CVS) has had binary diff capability for years. No additional software required.

Saturday, May 8, 2004

*  Recent Topics

*  Fog Creek Home