Hi,
We've been reading in XML's into our RPGLE parser but every so often we get the following string come in (from an external B2B) that causes the XML Parser to fail with error code 6 - invalid characters in XML.
The string snippet is ' canâÂ?Â?t ' it's the Â?Â? which is unicode U+0080 U+0099 - that is causing the problem in XML-SAX.
The characters seem to be UTF-8 so the XML seems valid.
This string seems to originally be (" can't " - but with) a rightquote, going through a double encoding problem from windows 1252 (Text copied from Outlook maybe) into a UTF-8 database via an application with a different encoding to UTF-8 or Windows. Anyway, this isn't our problem, and so we cannot resolve it - the problem is that these characters cause the XML-SAX Parser to fail with error code 6 - invalid characters in XML - even if I try to change the CCSID from the original 1252, to 1208, or any other CCSID.
Is there a way to prevent these characters causing this error - we cannot change the XML contents, although we could build in a layer to scan the XML document before feeding it to the parser, for anything that is not in string of friendly characters and remove them - but this seems like a botch.
Can we change the CCSID to something that allows it run ok, or is there something else I should do?
Any help / comments would be appreciated.
We've been reading in XML's into our RPGLE parser but every so often we get the following string come in (from an external B2B) that causes the XML Parser to fail with error code 6 - invalid characters in XML.
The string snippet is ' canâÂ?Â?t ' it's the Â?Â? which is unicode U+0080 U+0099 - that is causing the problem in XML-SAX.
The characters seem to be UTF-8 so the XML seems valid.
This string seems to originally be (" can't " - but with) a rightquote, going through a double encoding problem from windows 1252 (Text copied from Outlook maybe) into a UTF-8 database via an application with a different encoding to UTF-8 or Windows. Anyway, this isn't our problem, and so we cannot resolve it - the problem is that these characters cause the XML-SAX Parser to fail with error code 6 - invalid characters in XML - even if I try to change the CCSID from the original 1252, to 1208, or any other CCSID.
Is there a way to prevent these characters causing this error - we cannot change the XML contents, although we could build in a layer to scan the XML document before feeding it to the parser, for anything that is not in string of friendly characters and remove them - but this seems like a botch.
Can we change the CCSID to something that allows it run ok, or is there something else I should do?
Any help / comments would be appreciated.





Comment