来源:http://tjoe.wordpress.com/2007/08/23/xml-serialization-sorrows/
作者:Tom Goff’s .Net Musings
One of the features of our flagship product allows the user to save the results from a T-SQL script (run against one or more SQL Server databases) to a file. These results are stored in XML and can be re-opened and viewed by the user. One problem we have run into is that the XML standard does not allow certain characters[^]. For example, the null (0×00) character is not technically allowed even when encoded as “�” in the XML file.
Before I coded a work-around or fix, I wanted to understand where the problem data was coming from. In one instance, the null character was clearly coming from a string (or varchar) column. The SELECT statement was:
SELECT TOP 1 filename FROM sysaltfiles WITH (NOLOCK)
It turned out that the filename column included a trailing null character, but only on SQL Server 2000. I was able to verify this with Microsoft’s Query Analzyer.
There was another instance of an invalid character showing up, but this time the SELECT statement was:
SELECT password FROM syslogins WITH (NOLOCK)
It appears that syslogins is a view on top of sysxlogins, which defines the password column as a varbinary. When syslogins selects the password from sysxlogins it converts it as-is to a varchar. This produces the invalid characters, because the binary data cannot be converted in this manner.
Beyond these two tables a user can still write a valid SELECT statement which would produce an invalid character like so:
SELECT CHAR(0) AS InvalidChar
Now, I’ve been calling the characters “invalid” when really they are perfectly valid. SQL Server, .Net, ADO.Net, and all the controls used to display this data perfectly supports these characters. Even more, when saving these characters the XML serializer would properly encode them using their hexadecimal value (e.g. “�” for the null character). The problem was with the XML deserialization process.
When loading an XML file with one of these invalid characters, an InvalidOperationException[^] would be thrown with the following message:
There is an error in XML document (<Line>, <Column>).
This exception had an inner exception of type XmlException[^] and the following message:
‘.’, hexadecimal value 0×00, is an invalid character. Line <Line>, position <Column>.
The code used to deserialize the file was similar to the following:
TestObject testObject = null;
if (true == File.Exists(Form1.fileName)) {
XmlSerializer xs = null;
TextReader tr = null;
try {
xs = new XmlSerializer(typeof(TestObject));
tr = new StreamReader(Form1.fileName);
testObject = xs.Deserialize(tr) as TestObject;
}
catch (Exception ex) {
//… Exception caught here …
}
finally {
if (null != tr)
tr.Close();
}
}
The solution was very simple, but not obvious. Simply wrapping a XmlTextReader[^] around the StreamReader[^] allows the code to properly decode the invalid characters. This change is shown here:
TestObject testObject = null;
if (true == File.Exists(Form1.fileName)) {
XmlSerializer xs = null;
TextReader tr = null;
try {
xs = new XmlSerializer(typeof(TestObject));
tr = new StreamReader(Form1.fileName);
// Wrap stream with an XmlTextReader so bad chars are accepted
XmlTextReader xtr = new XmlTextReader(tr);
testObject = xs.Deserialize(xtr) as TestObject;
}
catch (Exception ex) {
// … No exception thrown …
}
finally {
if (null != tr)
tr.Close();
}
}
I’m unsure why the XML standard doesn’t allow all characters (even the null character, when properly encoded), but I’m sure they had/have their reasons. The only down side is that XML files that contain these invalid characters will still be considered invalid by other applications.
I’ve put together a sample project[^] that demonstrates the XML portion of this blog.