ibmi-brunch-learn

Announcement

Collapse
No announcement yet.

QzhbCgiParse not correctly decoding URL encoded characters?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • QzhbCgiParse not correctly decoding URL encoded characters?

    So I am experimenting with RPGLE CGI programs. Specifically decoding search parameters. And I have noticed an oddity with handling the £ character.

    When this is entered in a HTML form or some such, the browser will escape this single character in the search parms as two characters: %C2%A3
    When I read the QUERY_STRING environment variable in my CGI program, I can see this has already been EBCDIC translated to %62%B1
    When I then use QzhbCgiParse() to retrieve and decode this specific value from the query string, %62%B1 is unescaped to £ instead of just to £

    So I'm getting an extra, unwanted  character, as an artefact of (I assume) the escape/unescape process and/or the character set conversion.

    I assume the browser is escape encoding £ as two characters because £ does not technically exist in standard UTF-8? But why is the iSeries not able to decode it correctly? Is there a way I can make it decode correctly?

    This is a UK iSeries, but CCSID's throughout the system are 037 US instead of 285 UK, and that's not within my power to change, and neither are the HTTP server settings (which I assume are why the data is already translated to EBCDIC).

  • #2
    '£' _does_ exist in UTF-8. X'C2A8' is the 2-byte UTF-8 character for '£'. It sounds like something along the way doesn't understand that it's UTF-8.

    Here's a little RPG program that shows how interpreting the UTF-8 character as ASCII gives the result you're seeing.
    Code:
            dcl-ds *n;                                           
               utf8 varchar(5) ccsid(*utf8) inz('£') pos(1);     
               ascii_819 varchar(5) ccsid(819)       pos(1);     
            end-ds;                                              
            dcl-s job varchar(5);                                
            job = utf8;        // interpret x'C2A8' as one UTF-8
            job = ascii_819;   // interpret x'C2A8' as two ASCII
            return;
    In debug:
    Code:
    >
    > EVAL utf8:x                   
         00000     0002C2A3 404040..
    
    After assignment from utf8:
    EVAL job                                         
      JOB = '£    '                                    
    > EVAL job:x                                       
         00000     0001B100 000000..
    
    After assignment from ascii_819:
    > EVAL job                                         
      JOB = '£   '                                    
    > EVAL job:x                                       
         00000     000262B1 000000..

    Comment


    • #3
      So it sounds like one of two things
      Either the webserver's conversion to EBCDIC should be converting %C2%A3 to %B1 instead of %62%B1. (Maybe it thinks the incoming data is ASCII instead of UTF8?)
      Or, QzhbCgiParse() is not correctly decoding the escaped EBCDIC-encoded characters, and should be converting %62%B1 to £ instead of £

      I think web-side character set is a function of the web server? Any idea where/how I view that?

      Comment


      • #4
        Frankly, it's very clearly the former. It's treating the input as ASCII rather than UTF-8.

        Comment


        • #5
          I assume there's no way to override that within my CGI program? So much hangs off this server instance there's no way anyone will want to risk changing its config.

          Comment


          • #6
            Seems to me that the error has already been made by the time your CGI program is invoked, so its too late to fix it at that point.

            Either the program that is sending the document has to explicitly specify that the data is UTF-8, or you'd have to change your config.

            Comment

            Working...
            X