ibmi-brunch-learn

Announcement

Collapse
No announcement yet.

Best Resource/Example On Parsing xml data stored in clob column

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Best Resource/Example On Parsing xml data stored in clob column

    I have an "xml document" stored in a clob column. What db2 functions do I execute to parse the xml document into rpg variables? I've been playing around with xmltable but it seems to want the column your selecting to be of type xml instead of clob. Even when I run a modified example from https://www.ibm.com/developerworks/d...ola/index.html i get an error I do not know how to resolve:

    PHP Code:
    select x.*
    from 
        xmltable
    (
            
    '$d/dept/employee' passing 
                
    '<dept bldg="114">
                    <employee id="903">
                        <name>
                            <first>Mary</first>
                            <last>Jones</last>
                        </name>
                        <office>415</office>
                        <phone>905-403-6112</phone>
                        <phone>647-504-4546</phone>
                        <salary currency="USD">64000</salary>
                    </employee>
                </dept>' 
    as "d"  
            
    columns                                                                          
                empID       INTEGER     PATH 
    '@id',
                
    firstname   VARCHAR(20PATH 'name/first',
                
    lastname    VARCHAR(25PATH 'name/last'
        
    ) as x          

    The error I get is SQL 16011 result contains atomic value.

    I was hoping to use the db2 functions but I may switch to xml-into.

  • #2
    I can get the above example to work when i create a table with a column that is data type xml and select that column .

    PHP Code:
    select x.* 
    from 
        mylib
    .testxml
        
    xmltable(
            
    '$d/dept/employee' passing xmldata as "d" 
            
    columns
                empid       INTEGER         PATH 
    '@id',
                
    firstname   VARCHAR(20)     PATH 'name/first',
                
    lastname    VARCHAR(25)     PATH 'name/last'
        
    ) as x

    In this case, column xmldata is type xml. I get valid results from this query. How do I do the same thing with a CLOB column?

    Comment


    • #3
      The problem is: The XML to be searched must be in an internal format.
      When writing it to an XML column it is automatically converted into this format.
      If your XML is in a string, or in a CLOB column or read from the IFS or returned from a web-service, you need to convert it with XMLPARSE into this format.

      The following statement works correctly:
      Code:
      select x.*
      from
          xmltable(
              'dept/employee' passing
      [B]         [COLOR=#FF0000]XMLParse(Document[/COLOR][/B] '<dept bldg="114">
                      <employee id="903">
                          <name>
                              <first>Mary</first>
                              <last>Jones</last>
                          </name>
                          <office>415</office>
                          <phone>905-403-6112</phone>
                          <phone>647-504-4546</phone>
                          <salary currency="USD">64000</salary>
                      </employee>
                  </dept>'[COLOR=#FF0000][B])[/B][/COLOR]
              columns                                                                          
                  empID       INTEGER     PATH '@id',
                  firstname   VARCHAR(20) PATH 'name/first',
                  lastname    VARCHAR(25) PATH 'name/last'
          ) as x          
      ;
      Birgitta
      Last edited by B.Hauser; November 27, 2019, 10:37 PM.

      Comment


      • TheZenbudda
        TheZenbudda commented
        Editing a comment
        Thanks for the tip! I got it working! I needed to use xmlparse(document myclobcolumn) to get it to work.

    • #4
      I had the same problem that you.

      The problem is the line added to the XML in CLOB columns.
      It makes the XMLTABLE fails.

      My solution was to delete that line in the column.
      I don't like it, because it is slow. It's ok for one row. But not a good idea for queries that imply many rows.

      My solution was:

      Code:
      with aa as(
      SELECT 
        XMLCAST (
          XMLPARSE (
            DOCUMENT REPLACE(CLOB_COLUMN_IN_TABLE, '<?xml version="1.0" encoding="ISO-8859-1"?>', '')
            PRESERVE WHITESPACE
          ) as XML
        ) as CLOB_COLUMN_IN_TABLE
      FROM 
        TABLE
      WHERE CONDITION
      
      select x.*
      from aa,
      XMLTABLE ('$d/NODE_IN_XML' PASSING aa.CLOB_COLUMN_IN_TABLE AS "d"
      COLUMNS
      NAMECOLUMNFROMXML VARCHAR(10) PATH 'PATH_TO_NODE'
      ) AS X
      ;

      The best solution, of course, is to storethe XML in a XML column.
      In our case, we converted some tables with CLOB columns containig XML, into XML.

      I hope it helps.

      Comment


      • TheZenbudda
        TheZenbudda commented
        Editing a comment
        You were 100% correct! I had the <?xml blah blah ?> at the beginning of my xmldata as well! I had to take it out to make it work!

    • #5
      I've run into this annoyance. When the SQL functions parse XML, they validate that &lt;?xml line (the XML Declaration Line), which means it expects the encoding attribute in that line to match the encoding of the text.
      Which means for example if your XML is stored as text in a CLOB column with CCSID 37, but the Declaration Line reports it's encoding as UTF-8, or ISO-8859-1, or anything else that's not IBM037, the parse function will fail to parse it.

      As an alternative to removing the Declaration Line, you could cast the text to the expected encoding. In this example where the encoding reports ISO-8859-1 (which I think is Latin-1, and therefore CCSID 850), if you were to do cast(CLOB_COLUMN_IN_TABLE as clob(??) ccsid 850) and then XML parse that, then (I think) it would work because the CLOB CCSID would match the encoding attribute.
      Or if the XML will always have that specific encoding, you could just define the CLOB column in your table to have CCSID 850.


      If you would prefer to strip out the Declaration Line, there is I think a more reliable way.
      Instead of replace(), try using regexp_replace() using this regex pattern:
      Code:
      regexp_replace(CLOB_COLUMN_IN_TABLE,'^\s*<\?.*\?>','')
      Which matches:
      • ^ Start of string
      • \s* zero or more whitespaces (blank, tab, newline, etc)
      • <?
      • .* zero or more of any characters
      • ?>
      Replace that with blank, and it should cleanly strip out the declaration line even if it is not formatted exactly as you suspect (in case it has leading blanks, or different spacing, or different values, or line breaks in it, etc.)

      Comment


      • Vectorspace
        Vectorspace commented
        Editing a comment
        This: "they validate that &lt;?xml line "
        Was supposed to be this: "they validate that <?xml line "

      • TheZenbudda
        TheZenbudda commented
        Editing a comment
        Thanks for the info! Much more explanatory than I've found on the IBM sites!
    Working...
    X