Fog Creek Software
Discussion Board




Doing database transformations with XSL

Hi, I'm beginning a project where I will do a large number of transformations on several tables, inserting most of their information into another tables. The transformations might include arithmetic operations on some of the fields, but the toughest ones could involve selecting the right information from several tables at once (doing a lot of joins).
While I was designing the process, I thought most -but not all- the transformations could be done with simple XSL if both the source and the target systems were XML files, so I figured a scheme where I would convert the database rows into an XML file, processed it with the XSL, and inserted back the resulting XML file into the database.
The thing is, I don't know what to do with these "special" fields that involve doing a join with another table and getting some data from there.
Any ideas? I searched the web for some kind of universal transformation engine, but with no luck.

Dario Vasconcelos
Monday, June 16, 2003

To not answer your question, does anyone else think XSL is a spec that went off the deep end in terms of adding features?  It appears to be Turing complete.  Is it just me, or does anyone else think that when you start adding lots of programming language like features it's time to use a real programming language?

I have a similar reaction to complex stored procedures as well.

But maybe it's just me.

Jim Rankin
Monday, June 16, 2003

You have a point, but I think it's all about putting the work in the appropriate place. Complex stored procedures for data that is going to the user interface, no. However, when the issue is translating data within a database, then I put the weight on using SP's, since otherwise you
a) haul the data out of the database
b) massage as necessary
c) push the data back into the database

(a) and (c) take time and programming effort (read: potential for bugs). Now if the work is complex enough, then it still may be justifiable to pull it out to work on it.

With the original XSLT question I raise the same issue - are (a) and (c) (moving the data to XML and back again) worth it?

Philo

Philo
Monday, June 16, 2003

XSL is based on Scheme (a functional language) which is why it feels like it's "trying to be" a programming language. Tags are not a great way to express programs, trying to work in a functional language when you're used to procedural or object-oriented languages is hard enough without having to deal with tag-based syntax.

I'm not sure what you mean about a universal transformation engine. Do you mean an XSL processor? Xalan is an XSL processor from Apache - there are Java and C++ versions. I get the feeling you're looking for something else. Maybe something like Velocity (also Apache) might be useful - if memory serves me correctly you don't have to use it simply to generate HTML, it's an all purpose templating solution.

I'm using XSL for pretty simple XML to SQL transformations right now, it feels right and appropriate for such an application. Last year I spent a lot of time working on a project that used XSL as the HTML templating solution for every page and in that case I think it was overkill.

Xalan Java 2: http://xml.apache.org/xalan-j/index.html
Xalan C++: http://xml.apache.org/xalan-c/index.html
Velocity: http://jakarta.apache.org/velocity/index.html

Walter Rumsby
Monday, June 16, 2003

Although I would hate to code stored procedures (they're impossible to debug, the language is too limited, almost no reusability), at first that was the idea.
But I need some kind of extensible mechanism to code the transformation rules in, because there will be changes in the tables and the number of transformations.
So then I thought: "OK, I'll write some kind of very primitive processing language and that will do the trick". But I guess that really would be overkill, since the transformations are not trivial.
That's why I thought about XSL, but I guess some engine with the ability to specify filters/rules/formatting would be better for the job. The thing is, this is a really low-budget project.
But yes, I have the feeling that all that conversion from/to XML might not be worth it. Maybe some kind of "intelligent pipeline" between databases?

Sorry about the "universal transformation engine" concept, I used to work for Sybase and they sell a product whose slogan is precisely that :-)

Dario Vasconcelos
Tuesday, June 17, 2003

use a stored procedure or more general programming language. You can get the results and do all the work in memory. I don't see why writing XML files and transforming with XSL was even considered.

Tom Vu
Tuesday, June 17, 2003

I'm not sure I want to even begin to try to understand this one. SQL is the way to go. A nice stored proc or script file should do what you want. Why on earth do you want to dump it out of the DB, transform it, and then load it back in? It can all be done from the DB itself.

Sgt. Sausage
Tuesday, June 17, 2003

If you think stored procedures are hard to debug, you want to stay far, far away from XSL.

Philo

Philo
Tuesday, June 17, 2003

By the way, have you thought about/looked at commercial EDI processing engines? I don't know what your platform, budget, or the scale of the application is, but you've exactly described what Biztalk was designed to do.

Philo

Philo
Tuesday, June 17, 2003

I'd like to suggest using base SAS, if you have it available.  It has database access routines that you can read and write to most commercial databases. It's optimized for going through lots of rows of data and processing each row. It also does summarizations over rows, and if you happen to need statistical functionality, well, you're all set.

If you don't have it available I wouldn't recommend buying it, though. It is not cheap. Yeah, and OK, it's not the sexiest language there is...

Lauren B.
Tuesday, June 17, 2003

dario

r u talking about infomakers data pipelines?

Karel
Tuesday, June 17, 2003

XSL doesn't sound like the right approach here. I agree with those suggesting SPs or, straight SQL if transofrmation are simple enough.

You can make your transformations extensible and reusable by using templates to generate the actual SQL code. You can make your templates build upon each other creating small reusable segments. We've done this in one of  our products and it worked out quite well.

igor
Tuesday, June 17, 2003

Dairo, what database are you using? Most database vendors have data transformation packages. For example, MS has DTS for SQLServer.

igor
Tuesday, June 17, 2003

Another option, if you've got a lot of different transformations to do, is to use Microsoft's BizTalk server.
It's fairly pricey though, so it's not good for a one-off project.

Chris Tavares
Tuesday, June 17, 2003

Mmhhh... I've done a lot of XSL in the past and I still think stored procedures are harder to debug. Also, the XSL would only control the transformation, not the whole extraction/insertion process.

But I'm beginning to agree that XSL is not the best tool here. Many of the rules are not easy to express with XSL (mostly the ones that are very relational). I will check if Sybase -that's what I'm using- has some transformation tool.

In PowerBuilder (and I guess InfoMaker too) there is a tool called pipeline, with the ability to drive data from one table to another. Since in my case many tables will transform in many other tables, I was hoping for a more general solution. Also, programming in PowerBuilder is a real torture for me.

And Philo: that's exactly what I had in mind! Biztalk would be the perfect tool, if it wasn't for the ***really*** low budget of the project (enough only to pay for my salary :-)
Sybase sells something called "E-biz integrator" that used to belong to Financial Fusion which is also great for this kind of job.

I guess I'm not very profficient in SQL so I find it hard to imagine a stored proc-only solution which is also extensible and dynamic. But the SQL templates suggestion is very appealing...

Dario Vasconcelos
Tuesday, June 17, 2003

BTW, on Biztalk - don't get scared off by the $25k price tag. There's a "Partner" version that's something like $800/CPU (granted still bigger than a lot of budgets, but definitely better than $25k). You can also get a Developer version for free if you need a proof of concept to convince the money guys.

Even if they're paying you $20/hr, you only have to save a week's worth of work to justify Biztalk Partner...

Philo

Philo
Tuesday, June 17, 2003

Any more info on that "partner" version Philo?

Just me (Sir to you)
Wednesday, June 18, 2003

The "Partner" version is activated to use only one datasource (one partner) and costs $999.  Apparently, it has the same options and features than the other versions.
The trial version is usable for 120 days, which should give anyone enough time to think about buying the product or not, or at least to save the money... :-)

Dario Vasconcelos
Wednesday, June 18, 2003

*  Recent Topics

*  Fog Creek Home