Skip to main content

Problem:

If you need to generate a XML document with your data and some data contains "special" characters like áéíóúñ, this must be converted to UTF-8.

Resolution:

Just because you are generating a xml file, the runtime doesn't automatically convert ascii to UTF-8.

For example if you have a string of data of 40 chars long, and you have definitions on your program of the type:

           04 xml-fullname pic X(40) identified by

              "FullName" count in xml-fullname-count.

whenever you use PIC X(n) you are defining 1 fixed byte per character, and therefore you are not defining enough space for 40 UTF-8 characters, as they can be of variable lenght. Only if they fall in the range of ASCII 7 bits, only 1 byte is required.

So, if your xml field is going to contain UTF-8 data, first of all be prepared to keep up to 6 octets per character, so the example above would be:

           04 xml-fullname pic X(240) identified by

              "ApellidosNombre" count in xml-apellidosnombre-count.

Please have a look to kb article 19658 for more information about UTF-8.

When you move data from a file that has been created using a particular ASCII locale to a XML field, no conversion is performed and the move is performed following the standard rules for the MOVE statement, that is, byte by byte as it is.

So you should convert your ASCII data to UTF-8 format before you store it in your xml field.

To convert a PIC X(n) containing 1-byte ASCII data to another PIC X(n) containing the UTF-8 data equivalent you would first need to convert the ASCII data to Unicode using the NATIONAL-OF intrinsic function, and then convert the result of that to UTF-8 using the DISPLAY-OF intrinsic function with a second argument of 1208.

On a non-UTF-8 locale, both DISPLAY-OF and NATIONAL-OF can treat the "display" argument as UTF-8 if the second argument to the intrinsic function is the special  UTF-8 code page id numeric value of 1208.

So i.e., where you have

           move nombre-nirpf-1  to xml-apellidosnombre in xml-retenido.

you should have

       move function national-of(nombre-nirpf-1) to temp-data.

       move function display-of(temp-data, 1208)

                 to xml-apellidosnombre in xml-retenido.

Finally, compile the program with nsymbol(national), i.e.

      $set nsymbol(national)

so the first conversion is done to UTF-16 instead of dbcs. You can not do the conversion directly to UTF-8.

You may need latest versions of the product for this as there was a problem that was addressed in a fixpack for both 4.0 and 5.0.

Old KB# 2165