ibmi-brunch-learn

Announcement

Collapse
No announcement yet.

EBCDIC to Base64 encoding

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • EBCDIC to Base64 encoding

    Hi guys, I went through older posts and some other blogs too but didn't find the appropriate solution.

    I have a PDF in EBCDIC format stored in IFS, whose data is dumped into 1 long string(ebcdic_data). I intend to encode this string in base64 and receiving a run-time error as " EBCDIC character value not entirely enclosed by shift-out and shift-in" when moving to UTF-8 variable.


    Code:
    dcl-s ebcdic_data varchar(7000000);
    dcl-s $ifs_data varchar(7000000) ccsid(1208) inz;
    
    // convert data to UTF-8 format, so that data can be encoded to base64
    $ifs_data = %trim(ebcdic_data);  // getting error at this line
    
    wwEncLen = base64_encode( %addr($ifs_data : *data) : %len($ifs_data) : %addr(wwEncoded) : %size(wwEncoded) );
    Using 'https://base64.guru/converter/encode/pdf' , to verify data is encoded properly

  • #2
    You've said that you have a PDF document in EBCDIC. Can you explain that better? That doesn't make any sense to me.

    In my experience, a PDF document is a binary document. Yes, it has some text fields inside of it, but other parts are not text, so it's not okay to treat it as text. Also... I've never heard of the text parts being in EBCDIC.

    I strongly suspect that the data is not actually EBCDIC, which is why you would get errors with %trim() et al. Please consider treating the data as binary, and taking steps to make sure that it is never translated. (Except when it is base64 encoded... the encoded data is safe to translate.)

    Comment


    • #3
      Originally posted by Scott Klement View Post
      You've said that you have a PDF document in EBCDIC. Can you explain that better? That doesn't make any sense to me.

      In my experience, a PDF document is a binary document. Yes, it has some text fields inside of it, but other parts are not text, so it's not okay to treat it as text. Also... I've never heard of the text parts being in EBCDIC.

      I strongly suspect that the data is not actually EBCDIC, which is why you would get errors with %trim() et al. Please consider treating the data as binary, and taking steps to make sure that it is never translated. (Except when it is base64 encoded... the encoded data is safe to translate.)
      Hi Scott, this is the sample pdf data. Do you suggest otherwise?

      %PDF-1.3 %öäüß 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 3 0 obj << /Producer (macOS Version 10.15.6 \(Build 19G2021\) Quartz
      PDFContext, AppendMode 1.1) /CreationDate (D:20201005160600Z00'00') /ModDate (D:20201005160917Z00'00') >> endobj 2 0 obj << /Type /Pages /Count 1 /MediaBox [0 0 612 792] /Kids [4 0 R] >> endobj 4 0 obj << /BleedBox [0 0 612 792] /Type /Page /ArtBox [0 0 612 792] ¥YÛr 5 }×W ;³ lt ×àÅ 1ð òVñ < ( lªL ø}N·Ô ifÖÞØå*oïHê úôå´öV¿Ó·Úl;ý ~¹

      Comment


      • #4
        That's very obviously not EBCDIC.
        1. sequences like "öäüß" are obviously binary data
        2. the producer is listed as macOS -- which has no concept of EBCDIC.


        Treat it as binary.

        Comment


        • #5
          Originally posted by Scott Klement View Post
          That's very obviously not EBCDIC.
          1. sequences like "öäüß" are obviously binary data
          2. the producer is listed as macOS -- which has no concept of EBCDIC.


          Treat it as binary.
          My bad, I entered this data in one of the application to view processed pdf and it says it is EBCDIC data. Thank you for the correction.
          I tried again, this time it doesn't give me an error but doesn't convert to string what I expected. Am I missing anything obvious here?

          Code:
          dcl-s data varchar(7000000) ccsid(65535) inz;
          dcl-s $ifs_data varchar(7000000) ccsid(1208) inz;
          
          // datautf8: contains PDF data from IFS, in binary format
          $ifs_data = %trim(data);
          wwEncLen = base64_encode( %addr($ifs_data : *data) : %len($ifs_data) : %addr(wwEncoded) : %size(wwEncoded) );

          Comment


          • #6
            You're still treating the data as text. CCSID(1208) means that it is UTF-8 text. %TRIM() should never be done on a non-text field.

            Comment


            • #7
              Its not clear how you are using this data... you do not show how you are reading the PDF, or what you are doing with the output. This makes it difficult to know if I'm telling you the exact right steps....

              With that in mind, here's a simple example of reading an entire PDF file into memory and base64 encoding it as a single string, and writing that string to disk. (Hoping that's what you wanted)

              b64pdf.txt

              Comment


              • #8
                Originally posted by Scott Klement View Post
                Its not clear how you are using this data... you do not show how you are reading the PDF, or what you are doing with the output. This makes it difficult to know if I'm telling you the exact right steps....

                With that in mind, here's a simple example of reading an entire PDF file into memory and base64 encoding it as a single string, and writing that string to disk. (Hoping that's what you wanted)

                [ATTACH]n154905[/ATTACH]
                These are steps I am following:
                1. Read PDF as is and store binary data in DATA variable. The reason why I gave the data variable CCSID(65535), to tell the program to treat data as binary (I may be wrong in this)
                2. Passing DATA to field $ifs_data ccsid(1208) , to convert into ASCII, because after converting a short string like "%PDF-1.3 %öäüß 1 0 obj << /Type /Catalog /Pages 2 0 R >> " into ASCII and passing it do base64 encoder, works fine. So, thought a piece of string works, why not convert entire data to ascii.
                3. Passing the field $ifs_data to base64 encoder.

                I may be wrong at multiple corners, as I am not that good with ccsid and still in learning phase. Reached to above conclusions after reading blogs and some hit and trial method.
                Appreciate you tried your hands on code. Allow me to gave it a shot and shall certainly come back to you if I observe anything.

                Comment


                • #9
                  Your steps 1 and 3 are correct and all that is needed.

                  Step 2 is not needed and will not work. Base64 encoding is designed to work from a _binary_ stream and convert it to a series of characters that are basically universal - each character representing a unique 6 bit value.

                  Since the first point of convergence between groups of 8 bits and groups of 6 is 24 bits, 3 "characters" (or to be more precise 8 bits) are mapped to 4 sets of 6 bits. i.e. 3 x 8 = 24 = 4 x 6. So from the binary stream the first 6 bits of byte 1 are converted to their base64 code point. Then the remaining 2 bits of the first byte together with the first 4 bits of byte 2 become the second code point ... and so on and so on. Rather than go into it further I suggest you read this http://fm4dd.com/programming/base64/...CII%20standard.

                  The ASCII bit comes in when producing the output stream - and not before. If the code point is 1 ( binary 000001) then it is represented by the letter B. If it is 27 (binary 011011) then it is represented by b. It is those letters B and b that will normally be coded in ASCII.

                  So you only really need step 1 (read into string) and 3 - pass to base64 encoder.

                  Comment


                  • #10
                    Thanks JonBoy for that awesome explanation. let me tweak things a bit and shall share more details with you. Though I am also giving a try parallelly to scott's idea.

                    Comment


                    • #11
                      Originally posted by Scott Klement View Post
                      Its not clear how you are using this data... you do not show how you are reading the PDF, or what you are doing with the output. This makes it difficult to know if I'm telling you the exact right steps....

                      With that in mind, here's a simple example of reading an entire PDF file into memory and base64 encoding it as a single string, and writing that string to disk. (Hoping that's what you wanted)

                      [ATTACH]n154905[/ATTACH]
                      Thanks, Scott for coming to the rescue.. It worked for me with some minor tweaks. Appreciate your help.

                      Comment

                      Working...
                      X