Fog Creek Software
Discussion Board

Internationalization with web sites

I've had a debate with some people about the best way to internationalize a web site.  The site is Java/JSP based, but the concepts here aren't just Java related.

At issue is creating a seperate web page for each language, and importing the proper page based on the user's language. Or removing all strings from the web page and putting them into a resource file. This is similar to the way GUI apps work.

Personally, for text-heavy web sites, I like seperating the languages into different files. But I've tried to some up with an objective take on it:

Text in resource files for internationalization:

* All text is in one place
* Easier to have common text between files
* Do not need to change multiple JSP's when page structure changes
* Don't have to worry about translators stepping on code when changing
the JSP
* Don't have to go through a servlet to decide which JSP to include

* Additional development work in cross-referencing text to labels
* No easy way to find unused labels
* Not immediately obvious when a text field is missing (have to run
app and look at the page)
* Looking at the JSP in an editor gives little feel for how the file
actually looks when the text and some images don't appear
* Resource file will get very large for text-heavy sites
* Requires a lot of effort in supporting text-heavy pages, like help
files, disclaimers, etc.
* Translators will do the translation in a text file, and not see how
the translated text will appear on screen.
* Additional development step required in pulling text out of a JSP.
If a site isn't translated and this step is still performed, this is
wasted effort.

* References to images with text on them will also need to be in the
* Images who's size changes based on text width will need to be
specified in the resource bundle
* Table width sizes will need to be moved into a resource file, if
needed by different widths of text.

Other points? Any thing I've got wrong?

For people who do large internationalizated web sites, how do you do it?

Paul Vincent Craven
Monday, November 3, 2003

Development and then support of separate web pages will take more time, while testing is still required. Consider it only if you need very different layouts. I suggest to develop all pages with a ResourceBundle, including links to images. Development takes less effort than support.. CPU is not a problem, but memory is important.

Evgeny Gesin /Javadesk/
Monday, November 3, 2003

We went with the same 'resource' file stuff. (Well, we just inserted another layer on top of Smarty's templating engine) The best part though:

We're tagging each line of text with a version number. Only texts in foreign languages that are being translated in our (self written) translation tool will get that same version number.

Now, everyone can see if a translation is up to date or not. You don't have to know the language, you can just write a tool that compares all the version numbers between two resource files (the 'leading' language and the foreign language) and you'd know if the translation is up to date. (You could even automate such a thing in your test suite, etc.)

Jilles Oldenbeuving
Tuesday, November 4, 2003

> Development takes less effort than support..

This depends largely on how many environments your appliciation will be deployed. If you're deploying for, say 25 countries, your support effort will go up...

Jilles Oldenbeuving
Tuesday, November 4, 2003

One thing I've done/recommended with great success is using a DBMS as your content storage and retrieval engine.

Generally something like this is done (the modeling can change depending on how you set up the site):
Paragraph( ParaID, Text );
Image( URL, AltText );
Page( URL, Title );
Page_Language( Page_URL, Language );
Page_Composition( Page_URL, ParaID, Position );
Page_Images( Page_URL, Image_URL, Position );

So you insert paragraphs into the DBMS (of course, instead of storing the text in the DBMS you can store a path identifier to the location on disk of the particular paragraph) and then create a view over Page_Composition and Page_Images to generate the complete text for a given URL.  It works really, really well especially if you then take that output and cache it so you’re not hitting the DBMS with every pageview.

You only need the Page_Images and Image table if you do not wish to hard-code the images in the paragraph.  Generally I find it easier this way because then you can use the querying features of your DBMS to see which languages are using which images etc.

Tuesday, November 4, 2003

But this way you will loose a lot of speed, for things that are essentially static (the text on the pages). For certain situations this might not be a problem, but there are plenty of situations where you can't sacrifice that much speeds for internationalization (esp. since there is an alternative)

Jilles Oldenbeuving
Tuesday, November 4, 2003

*  Recent Topics

*  Fog Creek Home