Skip to main content

[Migrated content. Thread originally posted on 03 November 2011]

Hello,

I am trying to convert a chinese string codified in UTF-8, to Unicode.

In order to achieve that, I am working with a .NET managed code project. Here is my program:

       program-id. Program1 as "testcodepage.Program1".
       data division.
       working-storage section.
      * 01 greektext      type Byte[] value x"AFACAE9E".
       01 chinesetext    type Byte[] value x"E58F91E7A5A8".
       01 wsEncoding     type Encoding.
       01 codePageValues type Byte[].
       01 unicodeValues  string.
       01 b              type Byte.
       01 unicodeString  string.
       01 enumerator     type System.Globalization.TextElementEnumerator.
       01 s              string.
       01 i              binary-long.
       01 any-key        pic x.
       01 final-string   pic n(2).
       01 ind            pic 9.
       procedure division.
           invoke type System.IO.File::WriteAllBytes("chinese.txt", chinesetext)
           
        *> Specify the code page to correctly interpret byte values
      *     set wsEncoding to type Encoding::GetEncoding(737) *>(DOS) Greek code page
           set wsEncoding to type Encoding::UTF8
           set codePageValues to type System.IO.File::ReadAllBytes("chinese.txt")

        *> Same content is now encoded as UTF-16
           set unicodeValues to wsencoding::GetString(codePageValues)

        *> Show that the text content is still intact in Unicode string
        *> (Add a reference to System.Windows.Forms.dll)
           invoke type System.Windows.Forms.MessageBox::Show(unicodeValues)

        *> Same content "ψυχή" is stored as UTF-8
           invoke type System.IO.File::WriteAllText("chinese_unicode.txt", unicodeValues)

        *> Conversion is complete. Show the bytes to prove the conversion.
           display "8-bit encoding byte values:"
           perform varying b thru codePageValues
              invoke type Console::Write("{0:X}-", b)
           end-perform
           display " "   
           display "Unicode values:"
       
           set unicodeString to type System.IO.File::ReadAllText("chinese_unicode.txt")
           set enumerator to type System.Globalization.StringInfo::GetTextElementEnumerator(unicodeString)
           move 1 to ind
           perform until exit
              if enumerator::MoveNext()
                 set s to enumerator::GetTextElement()
                 set i to type Char::ConvertToUtf32(s, 0) 
                 set final-string(ind:1) to i *> How to perform the conversion to PIC(N)????
                 add 1 to ind
                 invoke type Console::Write("{0:X}-", i)
              else
                 exit perform
              end-if
           end-perform

        *> Show the chinese string converted
           invoke type System.Windows.Forms.MessageBox::Show(final-string)               
           display " "
           display "Press any key to exit."
           accept any-key
           goback.
           
       end program Program1.


I hope anybody could help me with this problem.

Thank you

[Migrated content. Thread originally posted on 03 November 2011]

Hello,

I am trying to convert a chinese string codified in UTF-8, to Unicode.

In order to achieve that, I am working with a .NET managed code project. Here is my program:

       program-id. Program1 as "testcodepage.Program1".
       data division.
       working-storage section.
      * 01 greektext      type Byte[] value x"AFACAE9E".
       01 chinesetext    type Byte[] value x"E58F91E7A5A8".
       01 wsEncoding     type Encoding.
       01 codePageValues type Byte[].
       01 unicodeValues  string.
       01 b              type Byte.
       01 unicodeString  string.
       01 enumerator     type System.Globalization.TextElementEnumerator.
       01 s              string.
       01 i              binary-long.
       01 any-key        pic x.
       01 final-string   pic n(2).
       01 ind            pic 9.
       procedure division.
           invoke type System.IO.File::WriteAllBytes("chinese.txt", chinesetext)
           
        *> Specify the code page to correctly interpret byte values
      *     set wsEncoding to type Encoding::GetEncoding(737) *>(DOS) Greek code page
           set wsEncoding to type Encoding::UTF8
           set codePageValues to type System.IO.File::ReadAllBytes("chinese.txt")

        *> Same content is now encoded as UTF-16
           set unicodeValues to wsencoding::GetString(codePageValues)

        *> Show that the text content is still intact in Unicode string
        *> (Add a reference to System.Windows.Forms.dll)
           invoke type System.Windows.Forms.MessageBox::Show(unicodeValues)

        *> Same content "ψυχή" is stored as UTF-8
           invoke type System.IO.File::WriteAllText("chinese_unicode.txt", unicodeValues)

        *> Conversion is complete. Show the bytes to prove the conversion.
           display "8-bit encoding byte values:"
           perform varying b thru codePageValues
              invoke type Console::Write("{0:X}-", b)
           end-perform
           display " "   
           display "Unicode values:"
       
           set unicodeString to type System.IO.File::ReadAllText("chinese_unicode.txt")
           set enumerator to type System.Globalization.StringInfo::GetTextElementEnumerator(unicodeString)
           move 1 to ind
           perform until exit
              if enumerator::MoveNext()
                 set s to enumerator::GetTextElement()
                 set i to type Char::ConvertToUtf32(s, 0) 
                 set final-string(ind:1) to i *> How to perform the conversion to PIC(N)????
                 add 1 to ind
                 invoke type Console::Write("{0:X}-", i)
              else
                 exit perform
              end-if
           end-perform

        *> Show the chinese string converted
           invoke type System.Windows.Forms.MessageBox::Show(final-string)               
           display " "
           display "Press any key to exit."
           accept any-key
           goback.
           
       end program Program1.


I hope anybody could help me with this problem.

Thank you
I have found the (partial) solution:

       program-id. Program1 as "testcodepage.Program1".
       data division.
       working-storage section.
      * 01 greektext      type Byte[] value x"AFACAE9E".
       01 chinesetext    type Byte[] value x"E58F91E7A5A8".
       01 wsEncoding     type Encoding.
       01 codePageValues type Byte[].
       01 unicodeValues  string.
       01 b              type Byte.
       01 unicodeString  string.
       01 enumerator     type System.Globalization.TextElementEnumerator.
       01 s              string.
       01 i              binary-long.
       01 c              type Char.
       01 any-key        pic x.
       01 final-string   pic n(2).
       01 ind            pic 9.
       procedure division.
           invoke type System.IO.File::WriteAllBytes("chinese.txt", chinesetext)
           
        *> Specify the code page to correctly interpret byte values
      *     set wsEncoding to type Encoding::GetEncoding(737) *>(DOS) Greek code page
           set wsEncoding to type Encoding::UTF8
           set codePageValues to type System.IO.File::ReadAllBytes("chinese.txt")

        *> Same content is now encoded as UTF-16
           set unicodeValues to wsencoding::GetString(codePageValues)

        *> Show that the text content is still intact in Unicode string
        *> (Add a reference to System.Windows.Forms.dll)
           invoke type System.Windows.Forms.MessageBox::Show(unicodeValues)

        *> Same content "ψυχή" is stored as UTF-8
           invoke type System.IO.File::WriteAllText("chinese_unicode.txt", unicodeValues)

        *> Conversion is complete. Show the bytes to prove the conversion.
           display "8-bit encoding byte values:"
           perform varying b thru codePageValues
              invoke type Console::Write("{0:X}-", b)
           end-perform
           display " "   
           display "Unicode values:"
       
           set unicodeString to type System.IO.File::ReadAllText("chinese_unicode.txt")
           set enumerator to type System.Globalization.StringInfo::GetTextElementEnumerator(unicodeString)
           move 1 to ind
           perform until exit
              if enumerator::MoveNext()
                 set s to enumerator::GetTextElement()
                 set i to type Char::ConvertToUtf32(s, 0) 
                 *>set final-string(ind:1) to i as  *> How to perform the conversion to PIC(N)????
                 set c to i
                 set final-string(ind:1) to c
                 add 1 to ind
                 invoke type Console::Write("{0:X}-", i)
              else
                 exit perform
              end-if
           end-perform

        *> Show the chinese string converted
           invoke type System.Windows.Forms.MessageBox::Show(final-string)               
           display " "
           display "Press any key to exit."
           accept any-key
           goback.
           
       end program Program1.


The problem now is about showing these characteres. For any reason, MessageBox and textBox do not display them.

Any ideas?

[Migrated content. Thread originally posted on 03 November 2011]

Hello,

I am trying to convert a chinese string codified in UTF-8, to Unicode.

In order to achieve that, I am working with a .NET managed code project. Here is my program:

       program-id. Program1 as "testcodepage.Program1".
       data division.
       working-storage section.
      * 01 greektext      type Byte[] value x"AFACAE9E".
       01 chinesetext    type Byte[] value x"E58F91E7A5A8".
       01 wsEncoding     type Encoding.
       01 codePageValues type Byte[].
       01 unicodeValues  string.
       01 b              type Byte.
       01 unicodeString  string.
       01 enumerator     type System.Globalization.TextElementEnumerator.
       01 s              string.
       01 i              binary-long.
       01 any-key        pic x.
       01 final-string   pic n(2).
       01 ind            pic 9.
       procedure division.
           invoke type System.IO.File::WriteAllBytes("chinese.txt", chinesetext)
           
        *> Specify the code page to correctly interpret byte values
      *     set wsEncoding to type Encoding::GetEncoding(737) *>(DOS) Greek code page
           set wsEncoding to type Encoding::UTF8
           set codePageValues to type System.IO.File::ReadAllBytes("chinese.txt")

        *> Same content is now encoded as UTF-16
           set unicodeValues to wsencoding::GetString(codePageValues)

        *> Show that the text content is still intact in Unicode string
        *> (Add a reference to System.Windows.Forms.dll)
           invoke type System.Windows.Forms.MessageBox::Show(unicodeValues)

        *> Same content "ψυχή" is stored as UTF-8
           invoke type System.IO.File::WriteAllText("chinese_unicode.txt", unicodeValues)

        *> Conversion is complete. Show the bytes to prove the conversion.
           display "8-bit encoding byte values:"
           perform varying b thru codePageValues
              invoke type Console::Write("{0:X}-", b)
           end-perform
           display " "   
           display "Unicode values:"
       
           set unicodeString to type System.IO.File::ReadAllText("chinese_unicode.txt")
           set enumerator to type System.Globalization.StringInfo::GetTextElementEnumerator(unicodeString)
           move 1 to ind
           perform until exit
              if enumerator::MoveNext()
                 set s to enumerator::GetTextElement()
                 set i to type Char::ConvertToUtf32(s, 0) 
                 set final-string(ind:1) to i *> How to perform the conversion to PIC(N)????
                 add 1 to ind
                 invoke type Console::Write("{0:X}-", i)
              else
                 exit perform
              end-if
           end-perform

        *> Show the chinese string converted
           invoke type System.Windows.Forms.MessageBox::Show(final-string)               
           display " "
           display "Press any key to exit."
           accept any-key
           goback.
           
       end program Program1.


I hope anybody could help me with this problem.

Thank you
I have found a unicode chinese font, and it seems to work almost well. The problem is that I must left an space between characters. In order to do so, I have changed these lines:
set final-string(ind:1) to c
add 1 to ind

... for the following:
set final-string(ind:1) to c
add 2 to ind

If I do not include spaces between characters, the textbox only shows the last one...

The final result is a line with too much space between characters. How do you suggest I could solve this problem?

Regards

[Migrated content. Thread originally posted on 03 November 2011]

Hello,

I am trying to convert a chinese string codified in UTF-8, to Unicode.

In order to achieve that, I am working with a .NET managed code project. Here is my program:

       program-id. Program1 as "testcodepage.Program1".
       data division.
       working-storage section.
      * 01 greektext      type Byte[] value x"AFACAE9E".
       01 chinesetext    type Byte[] value x"E58F91E7A5A8".
       01 wsEncoding     type Encoding.
       01 codePageValues type Byte[].
       01 unicodeValues  string.
       01 b              type Byte.
       01 unicodeString  string.
       01 enumerator     type System.Globalization.TextElementEnumerator.
       01 s              string.
       01 i              binary-long.
       01 any-key        pic x.
       01 final-string   pic n(2).
       01 ind            pic 9.
       procedure division.
           invoke type System.IO.File::WriteAllBytes("chinese.txt", chinesetext)
           
        *> Specify the code page to correctly interpret byte values
      *     set wsEncoding to type Encoding::GetEncoding(737) *>(DOS) Greek code page
           set wsEncoding to type Encoding::UTF8
           set codePageValues to type System.IO.File::ReadAllBytes("chinese.txt")

        *> Same content is now encoded as UTF-16
           set unicodeValues to wsencoding::GetString(codePageValues)

        *> Show that the text content is still intact in Unicode string
        *> (Add a reference to System.Windows.Forms.dll)
           invoke type System.Windows.Forms.MessageBox::Show(unicodeValues)

        *> Same content "ψυχή" is stored as UTF-8
           invoke type System.IO.File::WriteAllText("chinese_unicode.txt", unicodeValues)

        *> Conversion is complete. Show the bytes to prove the conversion.
           display "8-bit encoding byte values:"
           perform varying b thru codePageValues
              invoke type Console::Write("{0:X}-", b)
           end-perform
           display " "   
           display "Unicode values:"
       
           set unicodeString to type System.IO.File::ReadAllText("chinese_unicode.txt")
           set enumerator to type System.Globalization.StringInfo::GetTextElementEnumerator(unicodeString)
           move 1 to ind
           perform until exit
              if enumerator::MoveNext()
                 set s to enumerator::GetTextElement()
                 set i to type Char::ConvertToUtf32(s, 0) 
                 set final-string(ind:1) to i *> How to perform the conversion to PIC(N)????
                 add 1 to ind
                 invoke type Console::Write("{0:X}-", i)
              else
                 exit perform
              end-if
           end-perform

        *> Show the chinese string converted
           invoke type System.Windows.Forms.MessageBox::Show(final-string)               
           display " "
           display "Press any key to exit."
           accept any-key
           goback.
           
       end program Program1.


I hope anybody could help me with this problem.

Thank you
I think that I just found out what your original problem was.
You are using PIC N characters but you are not setting the directive:

$set nsymbol"national"

The default for nsymbol is: $set nsymbol"dbcs" which is not unicode.

If you add this directive then the messagebox displays the chinese characters correctly.

[Migrated content. Thread originally posted on 03 November 2011]

Hello,

I am trying to convert a chinese string codified in UTF-8, to Unicode.

In order to achieve that, I am working with a .NET managed code project. Here is my program:

       program-id. Program1 as "testcodepage.Program1".
       data division.
       working-storage section.
      * 01 greektext      type Byte[] value x"AFACAE9E".
       01 chinesetext    type Byte[] value x"E58F91E7A5A8".
       01 wsEncoding     type Encoding.
       01 codePageValues type Byte[].
       01 unicodeValues  string.
       01 b              type Byte.
       01 unicodeString  string.
       01 enumerator     type System.Globalization.TextElementEnumerator.
       01 s              string.
       01 i              binary-long.
       01 any-key        pic x.
       01 final-string   pic n(2).
       01 ind            pic 9.
       procedure division.
           invoke type System.IO.File::WriteAllBytes("chinese.txt", chinesetext)
           
        *> Specify the code page to correctly interpret byte values
      *     set wsEncoding to type Encoding::GetEncoding(737) *>(DOS) Greek code page
           set wsEncoding to type Encoding::UTF8
           set codePageValues to type System.IO.File::ReadAllBytes("chinese.txt")

        *> Same content is now encoded as UTF-16
           set unicodeValues to wsencoding::GetString(codePageValues)

        *> Show that the text content is still intact in Unicode string
        *> (Add a reference to System.Windows.Forms.dll)
           invoke type System.Windows.Forms.MessageBox::Show(unicodeValues)

        *> Same content "ψυχή" is stored as UTF-8
           invoke type System.IO.File::WriteAllText("chinese_unicode.txt", unicodeValues)

        *> Conversion is complete. Show the bytes to prove the conversion.
           display "8-bit encoding byte values:"
           perform varying b thru codePageValues
              invoke type Console::Write("{0:X}-", b)
           end-perform
           display " "   
           display "Unicode values:"
       
           set unicodeString to type System.IO.File::ReadAllText("chinese_unicode.txt")
           set enumerator to type System.Globalization.StringInfo::GetTextElementEnumerator(unicodeString)
           move 1 to ind
           perform until exit
              if enumerator::MoveNext()
                 set s to enumerator::GetTextElement()
                 set i to type Char::ConvertToUtf32(s, 0) 
                 set final-string(ind:1) to i *> How to perform the conversion to PIC(N)????
                 add 1 to ind
                 invoke type Console::Write("{0:X}-", i)
              else
                 exit perform
              end-if
           end-perform

        *> Show the chinese string converted
           invoke type System.Windows.Forms.MessageBox::Show(final-string)               
           display " "
           display "Press any key to exit."
           accept any-key
           goback.
           
       end program Program1.


I hope anybody could help me with this problem.

Thank you
uuuffff... this will be difficult to explain!

The solution is just move the utf8 string to the TextBox, and I didn't need to perform any conversion, neither use any directive.

As simple as doing this:
               move datos-869n(1:ind1) to final-string
               set self::txtChinese::Text to final-string

The point was that I thought I need to convert from utf-8 to unicode to display the data. (?_?)

So, my problem is completely solved.

Thanks anyway for your help.