Fog Creek Software
Discussion Board




Welcome! and rules

Joel on Software

ADO.NET & Foreign Text

I'm a bit of a newbie to internationalization of apps, so bear with me. An Sql Server 2000 table for our website has a "text" field (not "ntext"), where our users insert content in various languages (Korean and Chinese are popular). In regular old ASP/ADO, I'd select and write the data, and it would render just fine in a web browser.

Now as I convert our website to ASP.NET/ADO.NET, selecting the same field returns garbage to the web browser. I'd like to know a way around this; a way to exhibit the same behavior as in regular ASP.

I imagine this is due to the fact that .NET is in Unicode, and the "text" field is not.  Changing the field to "ntext" might help, but unfortunately, this is currently not an option.  Any suggestions?

Monsur
Wednesday, December 11, 2002

How are you rendering it?... Databinding? Response.Write()?

Duncan Smart
Wednesday, December 11, 2002

rendering using Response.Write

Monsur
Thursday, December 12, 2002

Actually, thinking about it  that shouldn't make a difference.
Maybe the browser is mangling it becasue of incorrect codepage recognition...
Look into the CodePage attribute of the <%@ Page %> directive: http://msdn.microsoft.com/library/en-us/cpgenref/html/cpconpage.asp

There's a table of code pages here:
http://msdn.microsoft.com/workshop/author/dhtml/reference/charsets/charset4.asp

You can also set it programmatically with Response.CodePage.

Duncan Smart
Thursday, December 12, 2002

Thanks Duncan for the CodePage tip.  Unfortunately, that didn't work.

It seems like the data is becoming "mangled" by ToString.  I actually did get an example to work, but I had to use an SqlDataReader, use GetBytes to read in all the bytes, and then write them out to a BinaryWriter.  This is way too tedious to do in production.  Plus although this works great with an SqlDataReader, what would be the analogous method for a DataSet?  This is sounding like an ADO.NET issue; any ideas would be appreciated.

Thanks!

Monsur
Sunday, December 15, 2002

Other things I would try:
  <%@ Page ResponseEncoding="..." %>
which could be: "utf-8" (the default I think... but not sure), "unicode", various others. (Weird thing is I would have expected "utf-8" to cope with it.)

It can also be cone at a web.config level:
<system.web>
    <globalization
        fileEncoding="utf-8" />
    ...

As to the ToString() method messing it up... try writing a simple Windows app and seeing if it has the same problem. If not then it's down to ASP.NET or the browser...

Duncan Smart
Monday, December 16, 2002

Actually, that was my next step.  I used two methods of writing to a file:

1) BinaryWriter in conjunction with SqlDataReader's GetBytes

2) StreamWriter in conjunction with SqlDataReader's GetString

Sure enough, case 1 writes the correct output, while case 2 does not. 

I also tried using the OleDb classes instead of SqlClient, with the same results.

I'm suspecting ADO.NET is trying to read the UTF-8 encoded field as Unicode, which causes the mangling.  GetBytes accesses the bytes before they get mangled.  But GetBytes really isn't the ideal solution to roll into production.  I haven't found any documentation covering ADO.NET and encodings.

Monsur
Tuesday, December 17, 2002

With GetBytes() all you need to do is something like:

  using System.Text;
  ...
  x = Encoding.Unicode.GetString( reader.GetBytes() );

(OK, not ideal when the DataReader *should* be interpreting the string correctly)

Does DataBinding give you the same problem? i.e.
  <p>... <%# reader["blah"] %> ...</p>
  ...
  Page.DataBind();

How about SqlDataReader.GetSqlString()?

Have you tried installing .NET Framewrk SP2?
http://msdn.microsoft.com/netframework/downloads/updates/sp/

This might will be worth posting to: http://discuss.develop.com/dotnet-clr.html as there you're likely to get an MS insider chasing it up.

Duncan Smart
Wednesday, December 18, 2002

First off, I must really thank you.  It's been very kind of you to stick with me on this issue.  You really live up to your name, Duncan Smart :)

Service Pack 2 is installed on our system.  Databinding as well as GetSqlString() gives the same results. 

However, the GetString() method DOES work.  But it runs into the same issues I had above when using a BinaryWriter above.  The SqlDataReader's GetBytes signature is

public long GetBytes(
  int i,
  long dataIndex,
  byte[] buffer,
  int bufferIndex,
  int length
);

So in order to grab the proper string, I need to iterate over GetBytes, using GetString at each iteration to generate the output string.  It is a solution, but a very unwieldy one for production.

And I have no guarantee that all the data access in my system will use SqlDataReaders.  Some areas use DataSets, for which I don't see an analogous GetBytes method.

So as you can see, I'm at an impasse.  I saw this question posted to a group a few months back, but it didn't have any responses.  I have tried a few other groups (including MS' Newsgroup), with the same results.  But now that I have a better understanding of the issue, I will try develop.com.  Thanks!

Monsur
Wednesday, December 18, 2002

*  Recent Topics

*  Fog Creek Home