ibmi-brunch-learn

Announcement

Collapse
No announcement yet.

Data conversion error CPE3490

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data conversion error CPE3490

    I could use a little help. I hope someone has some wisdom to share with me.

    I am dealing with two Very Large Corporations. I receive data from one of them (VLC1), massage it a bit, then pass it along to the second one (VLC2). It would be nice if VLC1 could format the data the way VLC2 wants it, but no such luck.

    Everything is done thru SFTP. VLC1 sends the data to a directory in the IFS. I wrote an RPG program using the _C_IFS routines to read the data from one stream file and write it to another stream file. The input is tagged with CCSID 1208. I build the output with 819. The job CCSID is 37.

    It worked fine for a while. However, recently I received a data conversion error -- CPE3490. I changed the CCSID of the output file to 1208, but it didn't make any difference.

    I think the problem is occurring on the input (fgets) operation, because the program gets part of the way thru the file, then reads only part of a line. I've looked at that line in the input file in hex, and it looks like straight ASCII to me -- I don't see anything unusual.

    So my questions:

    1. Any ideas on how I can determine the offending character(s)?

    2. Can you suggest other approaches, including but not necessarily limited to ones that might not require data conversion?

    Thanks in advance.

    Ted

  • #2
    In case anyone's interested . . .

    I found the offending characters. There are three lines with Hispanic last names with diacritical marks. Accented uppercase A, accented uppercase E and capital N with tilde. I found the first one with the DSPF command. I paged thru the data and when the first "bad" line came up, I got a CPF9897 message.

    Code:
    0 INVALID CHARCTERS FOR CCSID 01208 ENCOUNTERED IN DISPLAY
    RECORD 3. IF THE RECORD IS CHANGED, THEY WILL BE REPLACED WITH BLANKS.
    EDTF also gives the same message.

    I also used Notepad++ to open a downloaded copy of the file, and it displays those characters in the format xC1 in reverse image. (xC1 for accented capital A, xC9 for accented capital E, etc.)

    So question 1 is answered.

    Now I'll have to find a way to deal with them.

    Comment


    • #3
      Just a thought.

      You tell me that the error occurs when reading the IFS file.
      When you read from the file then the data are in UTF-8 format (ccsid 1208 ) but the data will be
      converted to EBCDIC (ccsid 37) and stored in the variable used for receiving the data from the read operation.
      I think this is what causes the error.

      What happens if you change the job ccsid to 819 -- a single character set?

      Comment


      • #4
        Thanks for the idea, Peder. I can't change the job CCSID to 819 because 819 is not EBCDIC. Just for grins, I tried to change the job from the command line.

        Code:
        CHGJOB CCSID(819)
        The system responded with CPF1854.

        Code:
        Message . . . . : Value 819 for CCSID not valid.
        Cause . . . . . : The value that was specified is not one of the acceptable
        values for the coded character set identifier (CCSID) parameter. Either the
        CCSID is not recognized by the system, or a CCSID was specified that is only
        valid on a DBCS system.
        This morning, I changed the input-line, output-line, and another work variable to CCSID(*UTF8). The idea was that the system would not have to convert the data to CCSID 37. It didn't make any difference. I still get the same error.

        I'm having a hard time believing that an RPG program can't handle Spanish letters with accents and tildes. Surely there are plenty of places in the US that do this every day.

        Comment


        • #5
          In case anybody's interested . . .

          I tried another experiment. I created a physical file with one field of CCSID(1208). Then I copied the data into that file.

          Code:
          CPYFRMIMPF FROMSTMF('xyz.txt') TOFILE(FLATFILE) MBROPT(*REPLACE) RCDDLM(*ALL) FLDDLM('`')
          I used the backtick as the field delimiter because there are no backticks in the data.

          It copied the data just fine except for those lines with the Spanish letters. For those lines, it copied up to the last character before the first Spanish letter. The rest of those lines was blank.

          Surely there is a way to copy this data. I cannot believe that IBM i cannot handle these characters.

          Comment


          • #6
            I think you are misunderstanding the problem. UTF-8 (CCSID 1208) has a specific way of encoding data. The problem is that the input data is not a valid UTF-8 sequence...

            Most likely the file you are reading isn't actually UTF-8... for example, maybe it's iso-8859-1 (CCSID 819) instead... Or Windows Latin-1 (CCSID 1252) or something else that's not UTF-8. If it truly is UTF-8 then whomever is creating it is doing so incorrectly.

            Changing the job ccsid to 819 makes absolutely no sense whatsoever. Most likely, Peder meant to say 'change the file ccsid' (but said "job" by mistake.)

            Comment


            • #7
              Thanks for the reply, Scott. I'm convinced that I'm misunderstanding something. That is, I doubt I need a PTF.

              I believe the data to be UTF8 because all the common characters (letters, digits, etc.) have their proper ASCII values, and because the three offending characters -- accented A, accented E and accented uppercase enye -- have the values x'C1', x'C9' and x'D1' respectively, and those are the values of those characters in UTF8.

              I will try changing the file to the other CCSID's as you suggest and see what happens.

              Thanks very much for your input.

              Comment


              • #8
                I changed the CCSID to 819 and the copy ran just fine. I don't know why these files come in with CCSID 1208, but I'm not going to worry about it. I'll just program around it.

                I really appreciate the help!

                Comment


                • #9
                  In case anyone's interested . . .

                  I understand now.

                  The files come from VLC1 via SFTP and are given the CCSID attribute 1208, UTF-8. I don't know who makes this decision. I assume IBM i does.

                  One thing that threw me off was this web page, which I still don't understand. It shows the Spanish characters having the same values (in decimal) that I was seeing in the raw data (in hexadecimal). This is why I thought the file had UTF-8 data.

                  The three Spanish letters in the file all began with B'110', which in UTF-8 indicates a two-byte character. See Wikipedia. The second and following bytes in a multi-byte UTF-8 character begin with B'10'. I used to know all this many moons ago, but I had forgotten it. When the system found a byte beginning with B'110' in my file, it expected the next byte to begin with B'10', which was not the case. Therefore, my program would cancel with errno value 3490, which is described in message CPE3490.

                  I used the CHGATR command to tell the system that the data was in CCSID (819), and the system no longer tried to interpret it as UTF-8. Problem solved and I'm a bit wiser.





                  Comment

                  Working...
                  X