ibmi-brunch-learn

Announcement

Collapse
No announcement yet.

Parsing XML and Valid Special Characters

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Parsing XML and Valid Special Characters

    Is there any comprehensive source of information regarding the parsing of XML from RPG using DB2 parsing functions and how to handle special characters?

    For example, a superscript 2 (or "squared") is html entity code &#178 (unicode 00B2). This must be a valid EBCDIC value because I can see it in one of our tables in a char field (CCSID 37). When the xml file contains ² (because the & symbol has to also be escaped), I would expect it to also translate to the proper EBCDIC value, but it doesn't, it retains the string &#178.

    Is there some documentation I can read through that will outline what characters are valid and what html entity codes (unicode values) can be parsed and properly translated by DB2 XML functions?

    Other symbols include: ®, é, double quotes (&#34.

    Not sure if this matters, but the files are being parsed directly off a network location mounted via /QNTC on the iSeries, and the CCSID of the files is 1252.

  • #2
    To me it looks like you are missing a ";" at the end of the &#178 -- it should be ²

    Take a look at this:
    W3Schools offers free online tutorials, references and exercises in all the major languages of the web. Covering popular subjects like HTML, CSS, JavaScript, Python, SQL, Java, and many, many more.

    Comment


    • #3
      Peder,

      The actual data looks like this:

      ² (because the & symbol has to also be escaped)

      This doesn't explain the issue of not allowing registered trademark or double quotes as an HTML entity.

      Comment


      • #4
        "the & symbol has to also be escaped"

        Are you saying the XML looks like this:

        <something>&amp;#178;</something?

        instead of this:

        <something>&#178;</something>

        Because if so then I think that's your problem. &amp;#178; would decode to the character string "&#178;", not to the character represented by &#178; . The xml has to contain &#178;, with the & unencoded.

        You need to encode the & only if it's not part of a html entity. In this case it is part of a html entity so it should not be encoded.

        For example, this test SQLRPGLE program correctly decoded &#178; to a superscript 2, using xmltable for the xml decoding:

        Code:
        **free
        
        dcl-s datain1  varchar(1000) ccsid(1208)
        inz('<doc><col>superscript 2: "&#178;"</col></doc>');
        dcl-s out1 varchar(100) ccsid(37) inz('');
        
        exec sql
          with xtab(xcol) as (values(xmlparse(document :datain1)))
          select val into :out1
            from xmltable(
              '$d/doc' passing(select xcol from xtab) as "d" columns
                val  varchar(45) default ' ' PATH 'col'
            ) as x
        
        // Use debug to monitor value of out1, it does contain a superscript 2
        // Has hex value 0xEA in position 17, which is the EBCDIC 37 representation of the superscript 2 character
        
        *inlr = *on;
        
        return;

        Comment


        • #5
          Vectorspace,

          Thanks for your response. Here is what we are getting:


          <Value>e=mc&amp;#178;</Value>

          I realize now that the app that is building this file is encoding all special chars, including the & required for html entities.

          Comment


          • #6
            That I would guess is happening is the data being passed into the builder is "e=mc&amp;#178;" instead of "e=mc²" - the caller has tried to be helpful and has pre-encoded the superscript 2

            Then the builder is doing what it should do and assuming that e=mc&amp;#178; is the text you want out at the end, and is encoding the &.

            So ideally the caller needs to pass in "e=mc²" and rely on the builder to do the conversion.

            If the builder cannot convert superscript 2 to a html entity, then hopefully it has a mechanism by which the caller can instruct it to not encode special characters.

            Comment

            Working...
            X