Fog Creek Software
Discussion Board




Welcome! and rules

Joel on Software

String Deserialization problem

I am looking for a nice clean way to pass strings that may contain control characters (ie, with Unicode values of less than 32 decimal) using the .NET Web Services framework. Such strings are perfectly valid if you keep them within a .NET application, and even when exposed via a Web Service they serialize to perfectly well-formed XML entities.

Unfortunately, they don't deserialize properly using a normal .NET client Web Reference. If you try, the client throws System.InvalidOperationException during SoapHttpClientProtocol.Invoke.

Sample code to reproduce the problem is trivial. Using VS.NET, create a C# XML Web Service, uncomment the HelloWorld example code, and insert a form feed escape sequence into the return string, viz:

[WebMethod]
public string HelloWorld()
{
return "Hello World\f";
}

That's all you need to break it. Now create a client - say, a Windows Forms application, add a Web Reference to the Web Service, and try to call the HelloWorld WebMethod, viz:

private void Form1_Load(object sender, System.EventArgs e)
{
localhost.Service1 s1 = new localhost.Service1;
MessageBox.Show(s1.HelloWorld());
this.Close();
}

This fails with the exception detailed above. It appears from the Exception.Message property that an inner XMLException has been swallowed, and in particular it claims that the character with hex value 0x0c (ie, form feed, or \f) is invalid.

But looking at the actual XML that is returned from the Web Service, there is no literal character 0x0c. Instead there is a perfectly legal entity:  (that's ampersand-hash-x-c-semicolon, if it gets mangled).

Does anyone know of any way to work around this problem, that doesn't involve some horrible manual pre-processing of all my strings on the way out of the Web Service door?

Thanks in advance for any help.

John Waterson
Friday, June 06, 2003

"Instead there is a perfectly legal entity: ..." actually, according to http://www.w3.org/TR/REC-xml#charsets it's not, only #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] are allowed.

I think you're going to have to define the values as byte[] instead and use Encoding.UTF8.GetBytes() and Encoding.UTF.GetString() (from System.Text) to go to and fro. This way they get sent as Base64 in the XML stream.

Duncan Smart
Friday, June 06, 2003

Thanks Duncan, I stand corrected. I thought that control characters were still legal if encoded as character references, but having now looked at the specs more closely I see that in fact this would require XML1.1 ( http://www.w3.org/TR/2002/CR-xml11-20021015/#sec4.1 ).

I guess I now have a gripe about the fact that .NET is generating a SOAP response that isn't actually well-formed XML, but even if Microsoft were to fix this it would only change the source of the exception, it wouldn't make it go away.

Thanks for the advice anyway.

John Waterson
Monday, June 09, 2003

*  Recent Topics

*  Fog Creek Home