Using CPYTOIMPF to create a UTF-8 encoded file.

BoZMan001 replied

November 16, 2021, 09:49 AM
For any one searching for an answer, I managed to create the UTF08 coded file on the IFS from a UTF-16 coded PF.
1. Create a template UTF-8 file as described here https://www.ibm.com/support/pages/ho...pytoimpf-utf-8 .
2. Copy the UTF-8 template CPY OBJ('utf8bom.txt') TOOBJ('utf8test2.txt') REPLACE(*YES).
3. Convert UTF-16 PF TEST/ADDMST to UTF-8 stream file utf8test2.txt.
CPYTOIMPF FROMFILE(TEST/ADDMST) TOSTMF('/home/test/utf8test2.txt') MBROPT(*REPLACE) FROMCCSID(1200) STMFCCSID(1208) RCDDLM(*CRLF)
Leave a comment:
Scott Klement replied

September 21, 2021, 12:54 PM
Makes sense. Try the things I suggested in my earlier post.
Leave a comment:
BoZMan001 replied

September 20, 2021, 01:58 AM
Hi Scott,

Thanks for your reply.
Yes, I meant an externally defined PF with the "CCSID(1208)" at file level. So all fields are coded as CCSID 1208.
I've tried it with a externally defined PF with CCSID 1200 as well but I get the same result with the command 'CPYTOIMPF FROMFILE(TEST/ADDMSTUT16) TOSTMF('/tmp/utf8test2.txt') STMFCCSID(1200) RCDDLM(*CRLF)'.
The Russian characters in the PF are replaced with a blocky character in the "utf8test2.txt" stream file.
Leave a comment:
Scott Klement replied

September 17, 2021, 08:45 PM
How did you create a flat file with CCSID(1208)?! I didn't think that was possible. Also, CPYTOIMPF is not designed for flat files.

Assuming you really mean a database table (i.e. externally defined PF) rather than a flat file, I would recommend coding the fields as CCSID 1200. (This is UTF-16). I've found that this works much better for database access than UTF-8, at least on IBM i.

Then, by all means, convert it from UTF-16 to UTF-8 when you use CPYTOIMPF. Since both UTF-16 and UTF-8 are fully compliant with Unicode, all character values will be preserved correctly.

If you really and truly mean a flat file, then I would strongly recommend that you eliminate it and write straight to a stream file. This supports UTF-8 perfectly, and does not require a separate PF -- you just write it to your "import file" directly. This saves disk space, runs faster, and has only a single point of failure, making it easier to maintain.
Leave a comment:
BoZMan001 started a topic Using CPYTOIMPF to create a UTF-8 encoded file.

September 17, 2021, 09:21 AM
Using CPYTOIMPF to create a UTF-8 encoded file.

Hi ,

I have a PF 'TEST/ADDMST' created with CCSID 1208 to store UTF-8 data. I have populated the file with some CYRILLIC characters.
We need to create a flat file from this data and send it to a third party application.
I use 'CPYTOIMPF FROMFILE(TEST/ADDMST) TOSTMF('/tmp/utf8test1.txt') STMFCCSID(1208) RCDDLM(*CRLF)' to create the IFS flat file.
I use filezilla/ftp to transfer this file to my PC but the file doesn't contain the correct data from the PF. It doesn't contain any Cyrillic characters but just some garbled data in their place.
Only the none-Cyrillic characters appear fine.
I've also tried creating the flat file with BOM Codes as described below.

https://www.ibm.com/support/pages/how-add-byte-order-mark-bom-codes-stmf-created-cpytoimpf-utf-8

What is the correct way to transfer UTF-8 data via a flat file?

Regards,
Ash
Tags: ccsid, ifs, utf-8

Announcement

Leave a comment:

Leave a comment:

Leave a comment: