ibmi-brunch-learn

Announcement

Collapse
No announcement yet.

XML-SAX - Error code 6

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • XML-SAX - Error code 6

    Hello,

    I have a file that's CCSID 819 and the processing job is 65535; every now and then the XML-SAX function throws error code 6 (invalid characters) because the file contains a ? symbol.

    I've tried changing the ccsid of; the file, the %xml() function and the job in numerous combinations and I can't seem to get it to work without throwing an error.

    I have then looked at ways of handling error code 6;
    XML-SAX (Parse an XML Document), XML-SAX (parse an XML document) operation code

    examples of the XML-SAX operation, XML-SAX (parse an XML document) operation code, examples



    The last URL reads to me that it stops normal processing and only continues looking for other errors, meaning you can't make it just ignore the error or replace it with something else, like a blank? This seems to be backed up by the example which has a "return" for the XML_EXCEPTION; would this not return to the program that called the XML-SAX function, halting processing of the file?

    Anyone got any ideas on how I can just ignore the invalid character and continue processing without rewriting it to use the SQL equivalents or fixing the source data? Is it going to involve checking the file first?

    Cheers,
    Ryan

  • #2
    Your third link is to the ILE COBOL manual. XML-SAX is an RPG opcode.

    Comment


    • #3
      It sounds like your file is not truly formatted in CCSID 819? I'd expect xml-sax should be able to translate it for you provided all your formats are correct. If it's a different format, I think you should be able to use the correct CCSID instead.

      Comment


      • #4
        I did not notice the COBOL thing. I really do find it hard to find the information on the IBM site that I need, the search never seems to truly show what you want. All it's for COBOL, it seems somewhat relevant for RPGLE as normal events seem to stop firing once an exception is hit. I have had a look for something similar in the RPGLE docs but I can't find info on it.

        I also can't find a code page for 819 (that details the sumbols and their hexadecimal) on the IBM site but I've found one on google and it doesn't have the ? symbol - I guess therefore that the problem is ? not existing in ccsid 819 triggering the error code 6.

        Per my initial post; I've tried modifying the CCSID of various things without any luck, how would one determine the correct code page? And again; is there a way to skip the parser error codes? Or is it just a case if maybe doing some sort of pre-validation of the file to make sure it's valid.

        Comment


        • #5
          In order to use the CCSID, you would need to know the source. Any information from the source system?
          Does your xml file having an encoding at the top? Something along the lines of <?xml version="1.0" encoding="utf-8"?> UTF-8 would be 1208.
          If you were using 819 you may see ISO 8859-1 in the encoding.

          Not sure if you could skip the parser error codes, did you try an on-error check, and see if it continues?

          Comment


          • #6
            Hi Anthony,

            No source system information, it's a translation log from a till.

            The header of the file is literally;
            <POS xmlns:dt="urn:schemas-microsoft-com:datatypes">

            From what I understand this is a name space, not an encoding declaration, so it's no help here?

            On-error catches the parser error, that's how it works currently but later on down the line it crashes cause data is missing - I traced it back to the error code 6 issue.

            After reading up it seems the information for COBOL is pertinent to RPGLE also; if an exception occurs the program stops firing normal events and therefore the XML file is only partially processed.


            I have tested this with the XML-SAX example in the URL and it seems to be the case, so I guess I'll try some different CCSID combinations to see if I can get it to work, otherwise I'm going to have to do some sort of pre-validation on the file to make sure it's valid.

            Comment


            • RDKells
              RDKells commented
              Editing a comment
              transaction log*

          • #7
            Originally posted by RDKells View Post
            No source system information, it's a translation log from a till.
            Sounds like the till is the source system, then. If its able to create a file, then it has some sort of CPU and some sort of software, etc, running on it.

            You would want to look in the documentation, or talk to the technical support for the company that makes it to find out which character set/encoding it is using to create the file.

            Originally posted by RDKells View Post
            The header of the file is literally;
            <POS xmlns:dt="urn:schemas-microsoft-com:datatypes">
            I think Anthony was asking about the XML processing instructions (PI). That would look like a tag with question marks in it, like this: <?xml?> It sounds like there isn't one in this document, which is a shame.

            If you can't get an answer from the people who make the device, you may be able to guess based on the code point used. For example, iso-8859-15 is mostly the same as iso-8859-1, except that the Euro symbol is at x'A4'. So if all other characters look the same as iso-8859-1 and the euro is x'A4', that might be it. If its mostly the same as iso-8859-1, but the euro is x'80', that'd be Windows-1252. If it is a 3-byte sequence x'E282AC' then it is UTF-8, etc.

            Guessing based on the code point is not as good as finding out from the source because the information is circumstantial. Multiple encodings may use the same code points, and other characters that aren't in the particular instance of the document you're reading would be unknown, so might not match. But, sometimes a guess is the best you can do.

            Comment


            • #8

              I think you should try CCSID 923 (iso-8859-15), iso-8859-1 does not include ??the euro sign
              Nicolas

              Comment


              • #9
                The only "header" type information, at the start of the XML document is what I posted - which I thought is what Anthony0 was asking for. There is no <?xml?> tag in the file.


                How would I determine what the hex value of the character is? Just run it in debug and check it via eval x 32?


                Thanks for the suggestion Nicolas - I changed it to 923 but still get the same error;

                Message ID . . . . . . : RNX0351 Severity . . . . . . . : 50
                Message type . . . . . : Escape
                Date sent . . . . . . : 10/07/19 Time sent . . . . . . : 10:25:59

                Message . . . . : The XML parser detected error code 6.
                Cause . . . . . : While parsing an XML document for an RPG procedure, the
                parser detected an error at offset 14778 with reason code 6. The actual
                document is

                The symbol at said position is the ? symbol.

                Opening the file in notepad++ gives this error;
                XML Parsing error at line 1:
                Input is not proper UTF-8, indicate encoding!
                Bytes: 0x80 0x31 0x32 0x2E



                As it doesn't seem possible to ignore the parser error and continue processing normally I'm just going to write something to pre-validate the file, if it fails then advise the offsets where the exceptions occurred and reject it - this will then force the other 3rd party to fix the issue their end.

                Comment


                • #10
                  Originally posted by RDKells View Post
                  How would I determine what the hex value of the character is? Just run it in debug and check it via eval x 32?
                  One way is to use the DSPF command, there's an F10=Hex option.

                  Comment


                  • #11
                    I have just realised that I never stipulated that this is an IFS file, rather than a database file - apologies.

                    If I view it via WRKLNK / 5 / F10 the symbol doesn't show for me;

                    My job CCSID options are;
                    Coded character set identifier . . . . . . . . . : 65535
                    Default coded character set identifier . . . . . : 37

                    The hex is;
                    Code:
                    6D61726B 65642080 31322E30 30266C74 3B2F4C69   marked  12.00&lt;/Li
                    Codepage 819 has x'31 for '1', pre-ceeding that is x'80' which I guess is 1252 for "euro" - as you pointed out earlier.

                    I've tried changing it to 1252 and I get error 302 instead;
                    302 The parser does not support the requested CCSID value or the first character of the XML document was not '<'.
                    Reading this document (don't worry, I checked it was relevant first);
                    https://www.ibm.com/support/knowledgecenter/en/ssw_ibm_i_73/rzasc/xmlparselimit.htm

                    Suggests that 1252 isn't supported.

                    I wonder if the file is created in 1252 and changed to 819 somewhere, I will take a look at that.

                    Cheers,
                    Ryan

                    Comment


                    • #12
                      1252 should work fine. The error message says that the document does not begin with a '<' character... Something's not right, here. How did you change it to 1252?

                      Comment


                      • #13
                        Via CHGATR OBJ('/myifsfile.xml') ATR(*CCSID) VALUE(1252) (aka option 13 on the IFS file)

                        Is there another way of doing it?

                        Comment


                        • RDKells
                          RDKells commented
                          Editing a comment
                          This is the 1252 file displayed via WRKLNK, my job is CCSID 65535, default 37 - the same as the job that processes the file.
                          Code:
                          - - - -  + - - -  - * - -  - - + -  - - - *    ----+----*----+----*
                          405C5C5C 5C5C5C5C 5C5C5C5C 5CC28587 89959589    ************Beginni
                          3C504F53 20786D6C 6E733A64 743D2275 726E3A73   <POS xmlns:dt="urn:s
                          Accoring to the wikipedia page for 1252 "3C" is a "<"
                          However; the processing job is 65535/37 - Does this matter? 3C is "DC4 - Device Control Four" in CCSID 37 (can't seem to find a code page of 65535)

                        • RDKells
                          RDKells commented
                          Editing a comment
                          As that link in my previous post doesn't say 1252 is supported, do you not think that's the issue?

                      • #14
                        CHGATR is a good way to do it.

                        I've heard of people doing it other ways (such as the dialog in EDTF, or using CPY, etc) that would translate the text rather than just changing the CCSID value, and people can sometimes get confused about that. But the way you're doing it is correct.

                        I'm really surprised that CCSID 1252 wouldn't be supported. This seems like a bizarre limitation (the OS supports 1252, and RPG is internally translating the file... why on earth would it not support all of the CCSIDs that the OS supports?).

                        If that is indeed the problem, you should be able to work around it. Just do a CPY command, specify *TEXT (not *BINARY) and tell it to copy your file from CCSID 1252 to 1200. Then it should work, since 1200 is supported.

                        But I'm highly skeptical about that link you provided because it doesn't even list 1208, which is far and away the most widely used CCSID for XML documents. I'm away from the office right now with no access to try things, but maybe on Monday I'll do some quick tests on stuff like 1252 or 1208.

                        Comment


                        • #15
                          Thanks for the CPY suggestion; 1252 didn't work but 1146 did - details below.

                          1252 no longer gave error 302 but it still gave error 6;
                          Code:
                          CPY OBJ('myfile.xml') TOOBJ('mynew1200file.xml')
                          FROMCCSID(1252) TOCCSID(1200) DTAFMT(*TEXT)
                          Message ID . . . . . . : RNX0351 Severity . . . . . . . : 50
                          Message type . . . . . : Escape
                          Date sent . . . . . . : 15/07/19 Time sent . . . . . . : 11:29:08

                          Message . . . . : The XML parser detected error code 6.
                          Cause . . . . . : While parsing an XML document for an RPG procedure, the
                          parser detected an error at offset 14778 with reason code 6. The actual
                          document is


                          Here's the hex of file after conversion; the Euro symbol was changed from 80 to 3F (The EUR symbol is just before the 1 in 12.00);

                          Code:
                           - - - -  + - - -  - * - -  - - + -  - - - *    ----+----*----+----*
                           D5E34040 40404094 81999285 84403FF1 F24BF0F0   NT     marked  12.00

                          I had a look at some of the common code pages we here; 27 & 285 and on the 285 page it advises that in CCSID 1146 "9F" is replaced by the euro symbol - looking at 37 and 285 "9F" is defined as;
                          "The currency sign (¤) is a character used to denote an unspecified currency."
                          So I converted from 1252 -> 1146;
                          Code:
                          CPY OBJ('myfile.xml') TOOBJ('mynew1146file.xml')
                          FROMCCSID(1252) TOCCSID(1146) DTAFMT(*TEXT)
                          Hex of file - you can now see the currency symbol;
                          - - - - + - - - - * - - - - + - - - - * ----+----*----+----*
                          D5E34040 40404094 81999285 84409FF1 F24BF0F0 NT marked ¤12.00

                          I didn't get a parser code 6 error and the file was processed via a job set as CCSID 65535/37.

                          I'll need to fully check all the relevant files to ensure nothing got mistranslated but at first glance; all looks good!

                          Thanks for your help, Scott

                          Comment


                          • RDKells
                            RDKells commented
                            Editing a comment
                            I've done a bit more digging here.

                            I think the source system is correctly creating them as 1252 but they are being converted to 819 via the FTP process.

                            CHGFTPA on our systems shows a CCSID of 00819 and both the incoming & outgoing tables are set to *CCSID

                            I created a 1252 file and let our FTP process pick it up - the resulting file it created in a new IFS location was 819 whilst the old file that was archived remained as 1252.


                            Looking at this link, which is for non-IFS files but seems relevant anyway;


                            It seems possible to alter the CCSID of a specific session, so I might explore that also.

                          • RDKells
                            RDKells commented
                            Editing a comment
                            I guess now it's a case of how does one convert the IFS file to a new CCSID, programatically, before the XML-SAX operations... I can't seem to find an API that does it so looks like I'm just going to have to create a temporary copy using CPY.
                        Working...
                        X