Announcement

**Scott Klement** · February 11, 2021, 08:21 AM

You've said that you have a PDF document in EBCDIC. Can you explain that better? That doesn't make any sense to me.

In my experience, a PDF document is a binary document. Yes, it has some text fields inside of it, but other parts are not text, so it's not okay to treat it as text. Also... I've never heard of the text parts being in EBCDIC.

I strongly suspect that the data is not actually EBCDIC, which is why you would get errors with %trim() et al. Please consider treating the data as binary, and taking steps to make sure that it is never translated. (Except when it is base64 encoded... the encoded data is safe to translate.)

**Chosen_1** · February 11, 2021, 08:59 AM

Originally posted by Scott Klement View Post

You've said that you have a PDF document in EBCDIC. Can you explain that better? That doesn't make any sense to me.

In my experience, a PDF document is a binary document. Yes, it has some text fields inside of it, but other parts are not text, so it's not okay to treat it as text. Also... I've never heard of the text parts being in EBCDIC.

I strongly suspect that the data is not actually EBCDIC, which is why you would get errors with %trim() et al. Please consider treating the data as binary, and taking steps to make sure that it is never translated. (Except when it is base64 encoded... the encoded data is safe to translate.)

Hi Scott, this is the sample pdf data. Do you suggest otherwise?

%PDF-1.3 %�� 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 3 0 obj << /Producer (macOS Version 10.15.6 $Build 19G2021$ Quartz
PDFContext, AppendMode 1.1) /CreationDate (D:20201005160600Z00'00') /ModDate (D:20201005160917Z00'00') >> endobj 2 0 obj << /Type /Pages /Count 1 /MediaBox [0 0 612 792] /Kids [4 0 R] >> endobj 4 0 obj << /BleedBox [0 0 612 792] /Type /Page /ArtBox [0 0 612 792] �Y�r 5 }�W ;� lt �� 1� �V� < ( l�L �}N�� if��*o�H� ��V�ӷ�l;� ~�

**Scott Klement** · February 11, 2021, 09:28 AM

That's very obviously not EBCDIC.

sequences like "��" are obviously binary data
the producer is listed as macOS -- which has no concept of EBCDIC.

Treat it as binary.

**Chosen_1** · February 11, 2021, 09:57 AM

Originally posted by Scott Klement View Post

That's very obviously not EBCDIC.

sequences like "��" are obviously binary data
the producer is listed as macOS -- which has no concept of EBCDIC.

Treat it as binary.

My bad, I entered this data in one of the application to view processed pdf and it says it is EBCDIC data. Thank you for the correction.
I tried again, this time it doesn't give me an error but doesn't convert to string what I expected. Am I missing anything obvious here?

Code:

dcl-s data varchar(7000000) ccsid(65535) inz;
dcl-s $ifs_data varchar(7000000) ccsid(1208) inz;

// datautf8: contains PDF data from IFS, in binary format
$ifs_data = %trim(data);
wwEncLen = base64_encode( %addr($ifs_data : *data) : %len($ifs_data) : %addr(wwEncoded) : %size(wwEncoded) );

**Scott Klement** · February 11, 2021, 10:06 AM

You're still treating the data as text. CCSID(1208) means that it is UTF-8 text. %TRIM() should never be done on a non-text field.

**Scott Klement** · February 11, 2021, 10:38 AM

Its not clear how you are using this data... you do not show how you are reading the PDF, or what you are doing with the output. This makes it difficult to know if I'm telling you the exact right steps....

With that in mind, here's a simple example of reading an entire PDF file into memory and base64 encoding it as a single string, and writing that string to disk. (Hoping that's what you wanted)

b64pdf.txt

**Chosen_1** · February 11, 2021, 12:01 PM

Originally posted by Scott Klement View Post

Its not clear how you are using this data... you do not show how you are reading the PDF, or what you are doing with the output. This makes it difficult to know if I'm telling you the exact right steps....

With that in mind, here's a simple example of reading an entire PDF file into memory and base64 encoding it as a single string, and writing that string to disk. (Hoping that's what you wanted)

[ATTACH]n154905[/ATTACH]

These are steps I am following:
1. Read PDF as is and store binary data in DATA variable. The reason why I gave the data variable CCSID(65535), to tell the program to treat data as binary (I may be wrong in this)
2. Passing DATA to field $ifs_data ccsid(1208) , to convert into ASCII, because after converting a short string like "%PDF-1.3 %�� 1 0 obj << /Type /Catalog /Pages 2 0 R >> " into ASCII and passing it do base64 encoder, works fine. So, thought a piece of string works, why not convert entire data to ascii.
3. Passing the field $ifs_data to base64 encoder.

I may be wrong at multiple corners, as I am not that good with ccsid and still in learning phase. Reached to above conclusions after reading blogs and some hit and trial method.
Appreciate you tried your hands on code. Allow me to gave it a shot and shall certainly come back to you if I observe anything.

**JonBoy** · February 11, 2021, 04:46 PM

Your steps 1 and 3 are correct and all that is needed.

Step 2 is not needed and will not work. Base64 encoding is designed to work from a _binary_ stream and convert it to a series of characters that are basically universal - each character representing a unique 6 bit value.

Since the first point of convergence between groups of 8 bits and groups of 6 is 24 bits, 3 "characters" (or to be more precise 8 bits) are mapped to 4 sets of 6 bits. i.e. 3 x 8 = 24 = 4 x 6. So from the binary stream the first 6 bits of byte 1 are converted to their base64 code point. Then the remaining 2 bits of the first byte together with the first 4 bits of byte 2 become the second code point ... and so on and so on. Rather than go into it further I suggest you read this http://fm4dd.com/programming/base64/...CII%20standard.

The ASCII bit comes in when producing the output stream - and not before. If the code point is 1 ( binary 000001) then it is represented by the letter B. If it is 27 (binary 011011) then it is represented by b. It is those letters B and b that will normally be coded in ASCII.

So you only really need step 1 (read into string) and 3 - pass to base64 encoder.

**Chosen_1** · February 12, 2021, 09:53 AM

Thanks JonBoy for that awesome explanation. let me tweak things a bit and shall share more details with you. Though I am also giving a try parallelly to scott's idea.

**Chosen_1** · February 12, 2021, 09:56 AM

Originally posted by Scott Klement View Post

Its not clear how you are using this data... you do not show how you are reading the PDF, or what you are doing with the output. This makes it difficult to know if I'm telling you the exact right steps....

With that in mind, here's a simple example of reading an entire PDF file into memory and base64 encoding it as a single string, and writing that string to disk. (Hoping that's what you wanted)

[ATTACH]n154905[/ATTACH]

Thanks, Scott for coming to the rescue.. It worked for me with some minor tweaks. Appreciate your help.

Announcement

EBCDIC to Base64 encoding

EBCDIC to Base64 encoding

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment