Convert ASCII data to UTF-8 with Visual Cobol

+2

Chris Glazier
Moderator
Forum|Forum|14 years ago
April 19, 2011

[Migrated content. Thread originally posted on 18 April 2011]

Hi,

I need to process UTF-8 data. In the documentation, I've read that I have to convert the data(with a code page) to UTF-16 in a national data item, and then convert it back to UTF-8 for output; and for the conversions, I have to use the functions NATIONAL-OF and DISPLAY-OF, respectively.

For exemple, if I have to convert data in Russian:

01 Russian-data pic X(10) value 'Russian-text'.
01 UnicodeString pic N(10).
01 UTF-8-String pic X(20).

Move function National-of(Russian-data, 866) to UnicodeString
Move function Display-of(UnicodeString, 01208) to UTF-8-String

...but it doesn't work. I'm not doing well? What should I do?

Thank you

Try setting the NSYMBOL"NATIONAL" compiler directive.

Like

D

Dominique Sacre
Author
Rocketeer
Forum|Forum|14 years ago
April 20, 2011

[Migrated content. Thread originally posted on 18 April 2011]

Hi,

I need to process UTF-8 data. In the documentation, I've read that I have to convert the data(with a code page) to UTF-16 in a national data item, and then convert it back to UTF-8 for output; and for the conversions, I have to use the functions NATIONAL-OF and DISPLAY-OF, respectively.

For exemple, if I have to convert data in Russian:

01 Russian-data pic X(10) value 'Russian-text'.
01 UnicodeString pic N(10).
01 UTF-8-String pic X(20).

Move function National-of(Russian-data, 866) to UnicodeString
Move function Display-of(UnicodeString, 01208) to UTF-8-String

...but it doesn't work. I'm not doing well? What should I do?

Thank you

This does not work ...

With another example, I will try to explain my problem.

I have a program in a project COBOL with UTF-8 text file encoding. Using diferent code number of diferent code pages, I try to display the character 'Ф':

...
01 NATIONAL-Prova pic N(20) usage National.
01 UNICODE-Prova pic X(40).
01 UTF8-Prova pic X(40).
....
*Using the UTF-16 code number
move NX"0424" to NATIONAL-PROVA.

move function Display-of(NATIONAL-Prova,1208) to UTF8-Prova.
Display "UTF8-Prova(1208): ",UTF8-Prova.

*Using the UTF-8 code number
move X'D0A4' to UNICODE-PROVA.
move function national-of(UNICODE-Prova,1208) to NATIONAL-Prova.

move function Display-of(NATIONAL-Prova,1208) to UTF8-Prova.
Display "UTF8-Prova(1208): ",UTF8-Prova.

*Using the 866 code number
move X'94' to UNICODE-PROVA.
move function national-of(UNICODE-PROVA,866) to NATIONAL-PROVA.

move function Display-of(NATIONAL-Prova,1208) to UTF8-Prova.
Display "UTF8-Prova(1208): ",UTF8-Prova.
...

Only in the first and second cases the character 'Ф' is correctly displayed; but in the third case, the program display the character '”'.

The problem is in the sentence "move function national-of(UNICODE-PROVA,866) to NATIONAL-PROVA", where the program not use the code page 866; it use the code page ANSI to convert the string to National.
What should I do to make the program will use the correct code page 866 and not the code page ANSI?

Thank you!

Like

D

Dominique Sacre
Author
Rocketeer
Forum|Forum|14 years ago
April 20, 2011

[Migrated content. Thread originally posted on 18 April 2011]

Hi,

I need to process UTF-8 data. In the documentation, I've read that I have to convert the data(with a code page) to UTF-16 in a national data item, and then convert it back to UTF-8 for output; and for the conversions, I have to use the functions NATIONAL-OF and DISPLAY-OF, respectively.

For exemple, if I have to convert data in Russian:

01 Russian-data pic X(10) value 'Russian-text'.
01 UnicodeString pic N(10).
01 UTF-8-String pic X(20).

Move function National-of(Russian-data, 866) to UnicodeString
Move function Display-of(UnicodeString, 01208) to UTF-8-String

...but it doesn't work. I'm not doing well? What should I do?

Thank you

This does not work ...

With another example, I will try to explain my problem.

I have a program in a project COBOL with UTF-8 text file encoding. Using diferent code number of diferent code pages, I try to display the character 'Ф':

...
01 NATIONAL-Prova pic N(20) usage National.
01 UNICODE-Prova pic X(40).
01 UTF8-Prova pic X(40).
....
*Using the UTF-16 code number
move NX"0424" to NATIONAL-PROVA.

move function Display-of(NATIONAL-Prova,1208) to UTF8-Prova.
Display "UTF8-Prova(1208): ",UTF8-Prova.

*Using the UTF-8 code number
move X'D0A4' to UNICODE-PROVA.
move function national-of(UNICODE-Prova,1208) to NATIONAL-Prova.

move function Display-of(NATIONAL-Prova,1208) to UTF8-Prova.
Display "UTF8-Prova(1208): ",UTF8-Prova.

*Using the 866 code number
move X'94' to UNICODE-PROVA.
move function national-of(UNICODE-PROVA,866) to NATIONAL-PROVA.

move function Display-of(NATIONAL-Prova,1208) to UTF8-Prova.
Display "UTF8-Prova(1208): ",UTF8-Prova.
...

Only in the first and second cases the character 'Ф' is correctly displayed; but in the third case, the program display the character '”'.

The problem is in the sentence "move function national-of(UNICODE-PROVA,866) to NATIONAL-PROVA", where the program not use the code page 866; it use the code page ANSI to convert the string to National.
What should I do to make the program will use the correct code page 866 and not the code page ANSI?

Thank you!

Like

+2

Chris Glazier
Moderator
Forum|Forum|14 years ago
April 20, 2011

[Migrated content. Thread originally posted on 18 April 2011]

Hi,

I need to process UTF-8 data. In the documentation, I've read that I have to convert the data(with a code page) to UTF-16 in a national data item, and then convert it back to UTF-8 for output; and for the conversions, I have to use the functions NATIONAL-OF and DISPLAY-OF, respectively.

For exemple, if I have to convert data in Russian:

01 Russian-data pic X(10) value 'Russian-text'.
01 UnicodeString pic N(10).
01 UTF-8-String pic X(20).

Move function National-of(Russian-data, 866) to UnicodeString
Move function Display-of(UnicodeString, 01208) to UTF-8-String

...but it doesn't work. I'm not doing well? What should I do?

Thank you

In the documentation for the NATIONAL-OF intrinsic function
see NATIONAL-OF docs:

it clearly states the following regarding the use of a code page as argument-2 to the function:

Must be an integer. Argument-2 identifies the source code page for the conversion.
Currently beyond the default codepage only 1208 is supported.

I believe that this is why it is failing for you.

Like

D

Dominique Sacre
Author
Rocketeer
Forum|Forum|14 years ago
April 20, 2011

[Migrated content. Thread originally posted on 18 April 2011]

Hi,

I need to process UTF-8 data. In the documentation, I've read that I have to convert the data(with a code page) to UTF-16 in a national data item, and then convert it back to UTF-8 for output; and for the conversions, I have to use the functions NATIONAL-OF and DISPLAY-OF, respectively.

For exemple, if I have to convert data in Russian:

01 Russian-data pic X(10) value 'Russian-text'.
01 UnicodeString pic N(10).
01 UTF-8-String pic X(20).

Move function National-of(Russian-data, 866) to UnicodeString
Move function Display-of(UnicodeString, 01208) to UTF-8-String

...but it doesn't work. I'm not doing well? What should I do?

Thank you

Hi,

In Visual Cobol, is there another way to convert a string, encoded in 866 code page (or encoded in antother code page), to a string encoded in Unicode (UTF-16, UTF-8)?

Thank you

Like

D

Dominique Sacre
Author
Rocketeer
Forum|Forum|14 years ago
April 20, 2011

[Migrated content. Thread originally posted on 18 April 2011]

Hi,

I need to process UTF-8 data. In the documentation, I've read that I have to convert the data(with a code page) to UTF-16 in a national data item, and then convert it back to UTF-8 for output; and for the conversions, I have to use the functions NATIONAL-OF and DISPLAY-OF, respectively.

For exemple, if I have to convert data in Russian:

01 Russian-data pic X(10) value 'Russian-text'.
01 UnicodeString pic N(10).
01 UTF-8-String pic X(20).

Move function National-of(Russian-data, 866) to UnicodeString
Move function Display-of(UnicodeString, 01208) to UTF-8-String

...but it doesn't work. I'm not doing well? What should I do?

Thank you

Hi,

In Visual Cobol, is there another way to convert a string, encoded in 866 code page (or encoded in antother code page), to a string encoded in Unicode (UTF-16, UTF-8)?

Thank you

Like

+2

Chris Glazier
Moderator
Forum|Forum|14 years ago
April 21, 2011

[Migrated content. Thread originally posted on 18 April 2011]

Hi,

I need to process UTF-8 data. In the documentation, I've read that I have to convert the data(with a code page) to UTF-16 in a national data item, and then convert it back to UTF-8 for output; and for the conversions, I have to use the functions NATIONAL-OF and DISPLAY-OF, respectively.

For exemple, if I have to convert data in Russian:

01 Russian-data pic X(10) value 'Russian-text'.
01 UnicodeString pic N(10).
01 UTF-8-String pic X(20).

Move function National-of(Russian-data, 866) to UnicodeString
Move function Display-of(UnicodeString, 01208) to UTF-8-String

...but it doesn't work. I'm not doing well? What should I do?

Thank you

You don't say whether you are creating a native project or a .NET managed code project.

The following example shows how this can be done in a .NET managed code program:


       program-id. Program1 as "testcodepage.Program1".
       data division.
       working-storage section.
       01 greektext      type Byte[] value x"AFACAE9E".
       01 wsEncoding     type Encoding.
       01 codePageValues type Byte[].
       01 unicodeValues  string.
       01 b              type Byte.
       01 unicodeString  string.
       01 enumerator     type System.Globalization.TextElementEnumerator.
       01 s              string.
       01 i              binary-long.
       01 any-key        pic x.
       procedure division.
           invoke type System.IO.File::WriteAllBytes("greek.txt", greektext)
           
        *> Specify the code page to correctly interpret byte values
           set wsEncoding to type Encoding::GetEncoding(737) *>(DOS) Greek code page
           set codePageValues to type System.IO.File::ReadAllBytes("greek.txt")

        *> Same content is now encoded as UTF-16
           set unicodeValues to wsencoding::GetString(codePageValues)

        *> Show that the text content is still intact in Unicode string
        *> (Add a reference to System.Windows.Forms.dll)
           invoke type System.Windows.Forms.MessageBox::Show(unicodeValues)

        *> Same content "ψυχή" is stored as UTF-8
           invoke type System.IO.File::WriteAllText("greek_unicode.txt", unicodeValues)

        *> Conversion is complete. Show the bytes to prove the conversion. 
           display "8-bit encoding byte values:"
           perform varying b thru codePageValues
              invoke type Console::Write("{0:X}-", b)
           end-perform
           display " "   
           display "Unicode values:"
        
           set unicodeString to type System.IO.File::ReadAllText("greek_unicode.txt")
           set enumerator to type System.Globalization.StringInfo::GetTextElementEnumerator(unicodeString)
           perform until exit
              if enumerator::MoveNext()
                 set s to enumerator::GetTextElement()
                 set i to type Char::ConvertToUtf32(s, 0)                
                 invoke type Console::Write("{0:X}-", i)
              else
                 exit perform
              end-if
           end-perform
                            
           display " "
           display "Press any key to exit."
           accept any-key
           goback.
           
       end program Program1.

Like

D

Dominique Sacre
Author
Rocketeer
Forum|Forum|14 years ago
April 21, 2011

[Migrated content. Thread originally posted on 18 April 2011]

Hi,

I need to process UTF-8 data. In the documentation, I've read that I have to convert the data(with a code page) to UTF-16 in a national data item, and then convert it back to UTF-8 for output; and for the conversions, I have to use the functions NATIONAL-OF and DISPLAY-OF, respectively.

For exemple, if I have to convert data in Russian:

01 Russian-data pic X(10) value 'Russian-text'.
01 UnicodeString pic N(10).
01 UTF-8-String pic X(20).

Move function National-of(Russian-data, 866) to UnicodeString
Move function Display-of(UnicodeString, 01208) to UTF-8-String

...but it doesn't work. I'm not doing well? What should I do?

Thank you

Do you have an example for native project or for a Cobol JVM project?
I'm working with Visual Cobol for Eclipse..

Thank you.