Fog Creek Software
Discussion Board

Making Programming Language-To-Language Compiler

Please pardon my ignorance, but could anyone advise me what I should start from if I wanted to build programming language translator?

Briefly, I'm using language that is likely to become obsolete shortly. I know a lot of companies which would like to migrate without loosing existent code base. The language was primarely developed for data access (not SQL), but its capable of doing console, GUI and Web applications. Initially, it was a great rapid development tool, but had very low potential.

I made search in google and it came up with several thick books.

I would be grateful if anyone recommended me a quick-start tutorial, which would allow me to pick next step.

Wednesday, January 14, 2004

find->replace in scripts.  Just hardcode it to get your code over to your new choice language.  You'll be able to get some standard stuff across with scripts, but there is bound to be tricky bits that you have to hand-translate.  And when it finally compiles, it might well be correct!

i like i
Wednesday, January 14, 2004

The first task would be to be able to parse the language, so you may want to look at the various parser generators there are.  I don't have much to basis a recommendation on, but have recently used ANTLR a bit:

It sounds like the language has a supporting environment (for the GUI, etc).  Likely there are support libraries that are not written in the language but are provided native code to the environment.

Any attempt to translate the code would also have to deal with these calls.  To do a perfect job you would have to reimplement all these libraries in a new environment.

Rob Walker
Wednesday, January 14, 2004

First off, let me say that I would not do such a thing unless you had quite a deep background in theory of programming languages.  There's a reason why those books are thick and the tutorials are few -- writing a language translator is every bit as hard as writing a compiler.

Second, there are some classes of problems you will encounter.  The two big ones that come to mind are:

* First, concept count -- do the source and destination languages have high fidelity of concepts?  Writing a VBScript-to-VB.NET translator is pretty straightforward.  It's like translating Yiddish into German.

Writing a JavaScript-to-C# compiler is kind of tricky -- how do you implement prototype inheritance?  dynamic arrays?  run-time code evaluation?  It's like translating French into English.

Writing a Python-to-Prolog converter is nigh impossible.  It's like translating ancient Egyptian into musical notation.

* Second, assuming the concept counts match, what about the runtime libraries?  This is where the real difficulty lies.  Unless you have some way of using the old language's runtime library from the new language, this can be an immense amount of work. Consider a pretty easy task, for example -- converting VBScript to JScript.  Your VBScript program calls "DateDiff".  Implement that in JScript so that it has EXACTLY the same semantics as in VBScript.  I implemented the date handling code in both VBScript and JScript, and I'm telling you right now, there are a LOT of tricky differences between them, all of which you have to get right.

Assuming that you actually want to go through with this thing, the way I'd structure the program is to first build a correct and fully functional lexer/parser for the source language.  Then you can write a program that transforms the parse tree to a parse tree of the target language.  Then write an anti-parser -- a device that converts parse trees back into source code for the target language, and you're done.

If you're looking for research into this idea, Microsoft Research did a lot of work on something called "Intentional Programming" which had at its heart this notion of programming by manipulating parse trees.  You may be able to find research papers on Intentional Programming which help.

Good luck!

Eric Lippert
Wednesday, January 14, 2004

We are talking about Progress 4GL here. Find/Replace just won't do :-( . Language structure is very different to any 3GL.

Wednesday, January 14, 2004

I second the recommendation to look at ANTLR.  It's probably the best parser generator available (but Flex/Bison would also work).

You use ANTRL to build a parser which will take the language code and produce, in memory, a structure containing the abstract represention the program.  From that representation you write code to produce code in the output language.

Everything Eric said about the difficulty of the task might apply to you -- especially in trying to convert a 4GL.  You would have to implement all the features of the 4GL in your 3GL language of choice before any translation of syntax would be worth while.  This might be a serious undertaking.

Almost Anonymous
Wednesday, January 14, 2004

Thank you for link and advise. I'm already there.

Unfortunately, I'm not a Java guru and would prefer C#, but it would require mostly Java output, I suppose, so I can handle it.

As to 4GL stuff, like native DB buffers, easy DB connection, automatic transaction scope and records locking I hope it can be simulated in 3GL.

I'm not very serious, just bored and would like to try at least I will learn something I couldn't afford to learn before. Task by itself sounds exciting, much better than building in-house business systems...

P.S. Nice to see that RefactorIT as a case study. This company was next door neightbor when I lived in Estonia. Makes me proud ;-).

Wednesday, January 14, 2004

> Unfortunately, I'm not a Java guru and would prefer C#,

ANTLR can be used in an entirely C# environment as long as you don't need to hack on the generator itself.  We use it with the C# backend. 

Rob Walker
Wednesday, January 14, 2004

I can recommend a book, not with great gusto, but at least with stolid optimism: Modern Compiler Implementation by Appel (in Java or C or ML), 2nd edition. Actually the Java one is probably better since you can use a nice, new, well-designed compiler-compiler called SableCC. This combines a lexer and a grammer parser in one package, and interfaces with a back end written in Java quite cleanly.

The main goal is to come up with a good intermediate representation language. In many cases, one IR language will work for many source languages. Heck, since most languages are implemented in one language -- assembler for whatever chip they all run on -- this is not surprising. But as has been mentioned, two very different language means a lot more work, because your IR will have to support all features of both languages. Fortunately, IR languages are a lot simpler than source languages.

Your back end should be a lot simpler that the usual compiler back end because you don't need to worry about ABIs and register colouring or memory bandwidth or anything.

Then again, why not just write a compiler? Or a new front end for an existing compiler? What is the value of having machine-generated source code, which will be totally unreadable, owing to auto-generated, context-ignorant identifiers? It's easier to maintain good code written in an unfamiliar language than bad code in a familiar one, no?

Brent Gulanowski
Wednesday, January 14, 2004

Brent, thank you a lot for sharing your thoughts. In my case we are talking about converting familiar language into less familiar, so it doesn't make any obvious sense. For me its more learning curve, than anything else.

The idea is migration from one, clearly dying platform to another, new one.

I'd like to find out for myself whichever makes more sence - re-using old DBs and writting code from scratch. Trying to half-automatically convert or support old code base.

May be start writting new code using .NET and try integrating it with old one? I'm trying to do a little research on how actually difficult to write simple language translator, which would take care of 90% of the code. Your advises are of a great help, having read links I'm already much more knowledgable.

Thursday, January 15, 2004

"Clearly dying"?  Progress?  I don't think so.

I've worked with Progress for over 15 years.  It's never been popular, but neither is it ever unpopular.  In fact, at one point Microsoft nearly bought Progress, but the two parties couldn't agree on a price.

You haven't really thought through what you are asking.  First of all, what database will you use?  Remember it has to be multi-user and needs to be suitable to scale all the way from embedded solutions to major banking applications.  That's quite a range.  And it needs to be able to simultaneously connect to foreign (e.e. Oracle) databases.  Don't forget stuff like After-Imaging as well as Before-Imaging, Two-phase committing, etc.  And your "undo" facilities also applies to program data, variables, arrays etc. as well as database data.  It has to work identically across a range of Operating Systems, (Progress compiles to a virtual machine and its code can be freely moved from any platform to any other platform).

And of course a built-in pre-processor, links to Java, and a FFI (Foreign Function Interface) to C/C++.

Of course you will need an *integrated* GUI.  One that understands and works with the back-end database, and also need the ability to automatically dish out HTML for Web Applications, again, all built-in.

Your "dynamics" support should be interesting.  Progress can dynamically create queries, widgets and so on with the greatest of ease.  When using these facilities to programmatically examine, alter and instantiate programs on the fly I can't help thinking of Greenspun's Tenth Law about advanced programs in effect containing Lisp Interpreters.

Basically, the scale of the task you are suggesting is *enormous*.

Forget it, just ponder on the fact that Cobol and Fortan are both still alive and kicking despite being "clearly dying" for years.

David B. Wildgoose
Thursday, January 15, 2004

David, I spent almost 7 years doing mostly Progress and WebSpeed work. I'm well familiar with most of its features. My conclusions are based on my own observations, expirience and over developer's opinions.

Progress has great DB product, which although is missing some important features like TimeStamp data type (coming in V10) and ODBC drivers which would provide full non-restricted access to system tables (if you ever tried to modify DB scheme via ODBC then you know what I mean). Everything else, including price is great.

Progress 4GL and WebSpeed are outdated products. Language structure they've chosen doesn't work anymore - you cannot be adding statements and keyword forever.  DB related conceptions are great until you touch new dynamic stuff - which has exactly same problems as SQL - it doesn't go with rest of language well enough. They probably should have designed dynamic queries differently, so we could "for each" thru them as usual and manipulate them using typed buffers, instead of C++ way.

Super procedures - an attempt to add OO layer, wasn't widely accepted by developers (myself I find it rather convinient) and is missing important features like parameters overloading.

Dynamic-functions and handles implementation doesn't always work.

Development tools are way too expensive for their quality. Very little third-party tools and libraries on the market. Usually you have to choose between one vendor and doing thing yourself. 

They would charge both developers and clients (for runtime), which makes difficult to sale products. Client usually has fixed budget for the system, so they don't mind whom they pay you or Progress. Obviously money which could be payed to you - payed to Progress.

They made very little development on the core libraries and tools in the last 3 years, buying instead other software companies - diversifying their products range.

4GL is dying, it might stay only as BL language (I think we'll get to this point in V11, since in V10 they introduce .NET integration).

I know more ISV-s who stopped using Progress in last 2 years, than those who started.

Thursday, January 15, 2004

I don't know about this specific problem, but here's an idea: Give Perl (or Python/Pike/Ruby/..) a shot. I suspect writing a compiler may not be your best way of solving the problem, because you'll probably spend a lot of time chasing obscure bugs in your parser or code generator if the original language is as obscure as you say and there are no parsers for it.

But if you do end up writing a compiler you'll have a lot of ready-made parsing to use. If you don't, they are very high level languages and since speed probably won't be critical when you're replacing legacy apps, you might be able to write powerful enough constructs so the (manual) porting may be rather straightforward.

I can't offer any more specific advice so I'll just stick to putting ideas in your head :/

Jonas B.
Thursday, January 15, 2004

I've been doing some research on newer (post-Lex/Yacc) compilers.  For books, the Appel book is OK, but I suggest starting with "Programming Language Processors in Java" by Watt and Brown.  It provides a nice introduction to the issues with compilers and language translators, with code snippets in Java.  If you're familiar with C#, it's an easy read.

Regarding compiler constructors, the only real choice for a C# compiler is ANTLR, and it looks like a solid product.  However, if you can live with using Java, I'd suggest looking at SableCC.  [1]  It's a very sophisticated compiler constructor that autogenerates a strongly-typed OO parse tree for the concrete syntax.  The latest version (3.x)  can also automagically generate a strongly-typed abstract syntax tree, [2] and provides a nice implementation of the visitor pattern to traverse the trees.

Another interesting Java compiler package is JJForester.  [3, 4]  It also generates strongly-typed concrete and abstract syntax trees, and implements the visitor pattern to traverse them.  One nice feature is that it uses the GLR parsing algorithm, which means that it can accept any context-free grammar (you don't have to worry about shift-reduce conflicts in the grammar.)  The downside is that the software doesn't appear to be actively maintained.





Robert Jacobson
Thursday, January 15, 2004

Timestamp datatype important?  An interesting priority.  I've always found the combination of the built-in date type combined with the ability to determine and save/manipulate the number of seconds since midnight to be more than adequate, especially seeing as you can format times according to your needs.  Conflating the two into a single type as Oracle etc.  do has never struck me as particularly sensible.

In what way does the language structure "not work" anymore?  That's like saying that Pascal doesn't work any more because it's procedural, and ignoring the existence of Delphi (Object Pascal).

Having said that, I agree with your comments about Super Procedures.  It's just that languages should be "horses for courses".  I wouldn't choose Progress to write Fast Fourier Transforms, but then I wouldn't choose C to write business applications either.

And this still doesn't address my main point.  If the teams of programmers at Progress Software Corporation haven't made all these changes, what makes you think that you can single-handledly implement them all yourself?

If you think it will be necessary to translate everything into another language then begin by modularising and compartmentalising your code properly, thereby allowing piecemeal manual translation.  Of course, having done this, you may suddenly discover that the resulting product is easy to understand and maintain, in which case the effort required to do the translation will be recognised as a waste of time.

David B. Wildgoose
Friday, January 16, 2004

David, please don't be deffensive, there is no need for it.  I do not see my goal as writting alternative Progress compiler as it doesn't make any commercial sence and Progress language spec is not open.

My goal is a feasibility research on making tool for half-automated code translation, which would allow easier migration to .NET for instance. Besides I find it very educative (I was reading all those articles on building compilers for last 2 days and found it being really exciting subject).

There are no need in discussing nitty-gritty details of 4GL pros and cons here. I'm very well aware of 4GL features, which being misused makes translation and maintenance living hell. I've seen applications based strictly on preprocessors, includes, shared variables and global buffers instead of input-output parameters. Too many Progress programmers are dedicated cut&pasters and never define a procedure.

I do not have personal problem with Progress as I do not have any valuable assets written in it (except for my 7 years expirience). I could possibly simply get .NET or Java job and forget. But I know many companies, which would grab opportunity to get rid of Progress apps with both hands.

In UK, USA and Australia there are no shortage of Progress developers. In smaller European countries it would be much easier to find MS or Java developer.

We could keep going on this topic, but I just don't like Progress quality and attitude. They do not eat own dog food (havn't notice any part of their web site being made in WebSpeed), they do not provide me with quality tools for money paid, they do not provide me with enough libraries, components and support.

But its entirely personal position.

Friday, January 16, 2004

Sorry if I'm not making my position clear.  I actually agree with a lot of what you say.  I have long said that Progress don't know how to write programs in their own language.  You only have to look at the dreadful example of a "browser" they supplied when version 4 came out.  It must be one of the worst pieces of code I have ever seen, and yet it was widely copied as "the way to do it".  As for shared variables and the "dedicated cut & paste" brigade, I couldn't agree with you more.

The problem with Progress isn't that it's dying, it's that it isn't growing.  And of course, standing still isn't the best option in an expanding industry.

What you are attempting is a valuable learning experience, and I can second the recommendation above for Appel's compiler book.  (I have the ML version because I have an interest in functional programming).  So from that point of view it is worth it.

I have to say that there are a *lot* of Progress programmers out there who don't really understand what should be Progress fundamentals like Transactions, Record-Scope, Index Cursors and the like.  Given that, and the generally poor standard of a lot of Progress code, you might actually be able to do a passable job.  In my experience, most Progress programmers seem to use a surprisingly limited subset of the language.  Target that subset and you could have a useful conversion utility for 80% of the code out there.

If you do go ahead with this, feel free to e-mail me to bounce ideas around.  (This is a genuine offer).  But I have to say I still think it's a much bigger task than you currently realise.

David B. Wildgoose
Friday, January 16, 2004

*  Recent Topics

*  Fog Creek Home