[Migrated content. Thread originally posted on 03 November 2011]
Hello,
I am trying to convert a chinese string codified in UTF-8, to Unicode.
In order to achieve that, I am working with a .NET managed code project. Here is my program:
program-id. Program1 as "testcodepage.Program1".
data division.
working-storage section.
* 01 greektext type Byte[] value x"AFACAE9E".
01 chinesetext type Byte[] value x"E58F91E7A5A8".
01 wsEncoding type Encoding.
01 codePageValues type Byte[].
01 unicodeValues string.
01 b type Byte.
01 unicodeString string.
01 enumerator type System.Globalization.TextElementEnumerator.
01 s string.
01 i binary-long.
01 any-key pic x.
01 final-string pic n(2).
01 ind pic 9.
procedure division.
invoke type System.IO.File::WriteAllBytes("chinese.txt", chinesetext)
*> Specify the code page to correctly interpret byte values
* set wsEncoding to type Encoding::GetEncoding(737) *>(DOS) Greek code page
set wsEncoding to type Encoding::UTF8
set codePageValues to type System.IO.File::ReadAllBytes("chinese.txt")
*> Same content is now encoded as UTF-16
set unicodeValues to wsencoding::GetString(codePageValues)
*> Show that the text content is still intact in Unicode string
*> (Add a reference to System.Windows.Forms.dll)
invoke type System.Windows.Forms.MessageBox::Show(unicodeValues)
*> Same content "ψυχή" is stored as UTF-8
invoke type System.IO.File::WriteAllText("chinese_unicode.txt", unicodeValues)
*> Conversion is complete. Show the bytes to prove the conversion.
display "8-bit encoding byte values:"
perform varying b thru codePageValues
invoke type Console::Write("{0:X}-", b)
end-perform
display " "
display "Unicode values:"
set unicodeString to type System.IO.File::ReadAllText("chinese_unicode.txt")
set enumerator to type System.Globalization.StringInfo::GetTextElementEnumerator(unicodeString)
move 1 to ind
perform until exit
if enumerator::MoveNext()
set s to enumerator::GetTextElement()
set i to type Char::ConvertToUtf32(s, 0)
set final-string(ind:1) to i *> How to perform the conversion to PIC(N)????
add 1 to ind
invoke type Console::Write("{0:X}-", i)
else
exit perform
end-if
end-perform
*> Show the chinese string converted
invoke type System.Windows.Forms.MessageBox::Show(final-string)
display " "
display "Press any key to exit."
accept any-key
goback.
end program Program1.I hope anybody could help me with this problem.
Thank you
I have found the (partial) solution:
program-id. Program1 as "testcodepage.Program1".
data division.
working-storage section.
* 01 greektext type Byte[] value x"AFACAE9E".
01 chinesetext type Byte[] value x"E58F91E7A5A8".
01 wsEncoding type Encoding.
01 codePageValues type Byte[].
01 unicodeValues string.
01 b type Byte.
01 unicodeString string.
01 enumerator type System.Globalization.TextElementEnumerator.
01 s string.
01 i binary-long.
01 c type Char.
01 any-key pic x.
01 final-string pic n(2).
01 ind pic 9.
procedure division.
invoke type System.IO.File::WriteAllBytes("chinese.txt", chinesetext)
*> Specify the code page to correctly interpret byte values
* set wsEncoding to type Encoding::GetEncoding(737) *>(DOS) Greek code page
set wsEncoding to type Encoding::UTF8
set codePageValues to type System.IO.File::ReadAllBytes("chinese.txt")
*> Same content is now encoded as UTF-16
set unicodeValues to wsencoding::GetString(codePageValues)
*> Show that the text content is still intact in Unicode string
*> (Add a reference to System.Windows.Forms.dll)
invoke type System.Windows.Forms.MessageBox::Show(unicodeValues)
*> Same content "ψυχή" is stored as UTF-8
invoke type System.IO.File::WriteAllText("chinese_unicode.txt", unicodeValues)
*> Conversion is complete. Show the bytes to prove the conversion.
display "8-bit encoding byte values:"
perform varying b thru codePageValues
invoke type Console::Write("{0:X}-", b)
end-perform
display " "
display "Unicode values:"
set unicodeString to type System.IO.File::ReadAllText("chinese_unicode.txt")
set enumerator to type System.Globalization.StringInfo::GetTextElementEnumerator(unicodeString)
move 1 to ind
perform until exit
if enumerator::MoveNext()
set s to enumerator::GetTextElement()
set i to type Char::ConvertToUtf32(s, 0)
*>set final-string(ind:1) to i as *> How to perform the conversion to PIC(N)????
set c to i
set final-string(ind:1) to c
add 1 to ind
invoke type Console::Write("{0:X}-", i)
else
exit perform
end-if
end-perform
*> Show the chinese string converted
invoke type System.Windows.Forms.MessageBox::Show(final-string)
display " "
display "Press any key to exit."
accept any-key
goback.
end program Program1.The problem now is about showing these characteres. For any reason, MessageBox and textBox do not display them.
Any ideas?
[Migrated content. Thread originally posted on 03 November 2011]
Hello,
I am trying to convert a chinese string codified in UTF-8, to Unicode.
In order to achieve that, I am working with a .NET managed code project. Here is my program:
program-id. Program1 as "testcodepage.Program1".
data division.
working-storage section.
* 01 greektext type Byte[] value x"AFACAE9E".
01 chinesetext type Byte[] value x"E58F91E7A5A8".
01 wsEncoding type Encoding.
01 codePageValues type Byte[].
01 unicodeValues string.
01 b type Byte.
01 unicodeString string.
01 enumerator type System.Globalization.TextElementEnumerator.
01 s string.
01 i binary-long.
01 any-key pic x.
01 final-string pic n(2).
01 ind pic 9.
procedure division.
invoke type System.IO.File::WriteAllBytes("chinese.txt", chinesetext)
*> Specify the code page to correctly interpret byte values
* set wsEncoding to type Encoding::GetEncoding(737) *>(DOS) Greek code page
set wsEncoding to type Encoding::UTF8
set codePageValues to type System.IO.File::ReadAllBytes("chinese.txt")
*> Same content is now encoded as UTF-16
set unicodeValues to wsencoding::GetString(codePageValues)
*> Show that the text content is still intact in Unicode string
*> (Add a reference to System.Windows.Forms.dll)
invoke type System.Windows.Forms.MessageBox::Show(unicodeValues)
*> Same content "ψυχή" is stored as UTF-8
invoke type System.IO.File::WriteAllText("chinese_unicode.txt", unicodeValues)
*> Conversion is complete. Show the bytes to prove the conversion.
display "8-bit encoding byte values:"
perform varying b thru codePageValues
invoke type Console::Write("{0:X}-", b)
end-perform
display " "
display "Unicode values:"
set unicodeString to type System.IO.File::ReadAllText("chinese_unicode.txt")
set enumerator to type System.Globalization.StringInfo::GetTextElementEnumerator(unicodeString)
move 1 to ind
perform until exit
if enumerator::MoveNext()
set s to enumerator::GetTextElement()
set i to type Char::ConvertToUtf32(s, 0)
set final-string(ind:1) to i *> How to perform the conversion to PIC(N)????
add 1 to ind
invoke type Console::Write("{0:X}-", i)
else
exit perform
end-if
end-perform
*> Show the chinese string converted
invoke type System.Windows.Forms.MessageBox::Show(final-string)
display " "
display "Press any key to exit."
accept any-key
goback.
end program Program1.I hope anybody could help me with this problem.
Thank you
I have found a unicode chinese font, and it seems to work almost well. The problem is that I must left an space between characters. In order to do so, I have changed these lines:
set final-string(ind:1) to c
add 1 to ind
... for the following:
set final-string(ind:1) to c
add 2 to ind
If I do not include spaces between characters, the textbox only shows the last one...
The final result is a line with too much space between characters. How do you suggest I could solve this problem?
Regards
[Migrated content. Thread originally posted on 03 November 2011]
Hello,
I am trying to convert a chinese string codified in UTF-8, to Unicode.
In order to achieve that, I am working with a .NET managed code project. Here is my program:
program-id. Program1 as "testcodepage.Program1".
data division.
working-storage section.
* 01 greektext type Byte[] value x"AFACAE9E".
01 chinesetext type Byte[] value x"E58F91E7A5A8".
01 wsEncoding type Encoding.
01 codePageValues type Byte[].
01 unicodeValues string.
01 b type Byte.
01 unicodeString string.
01 enumerator type System.Globalization.TextElementEnumerator.
01 s string.
01 i binary-long.
01 any-key pic x.
01 final-string pic n(2).
01 ind pic 9.
procedure division.
invoke type System.IO.File::WriteAllBytes("chinese.txt", chinesetext)
*> Specify the code page to correctly interpret byte values
* set wsEncoding to type Encoding::GetEncoding(737) *>(DOS) Greek code page
set wsEncoding to type Encoding::UTF8
set codePageValues to type System.IO.File::ReadAllBytes("chinese.txt")
*> Same content is now encoded as UTF-16
set unicodeValues to wsencoding::GetString(codePageValues)
*> Show that the text content is still intact in Unicode string
*> (Add a reference to System.Windows.Forms.dll)
invoke type System.Windows.Forms.MessageBox::Show(unicodeValues)
*> Same content "ψυχή" is stored as UTF-8
invoke type System.IO.File::WriteAllText("chinese_unicode.txt", unicodeValues)
*> Conversion is complete. Show the bytes to prove the conversion.
display "8-bit encoding byte values:"
perform varying b thru codePageValues
invoke type Console::Write("{0:X}-", b)
end-perform
display " "
display "Unicode values:"
set unicodeString to type System.IO.File::ReadAllText("chinese_unicode.txt")
set enumerator to type System.Globalization.StringInfo::GetTextElementEnumerator(unicodeString)
move 1 to ind
perform until exit
if enumerator::MoveNext()
set s to enumerator::GetTextElement()
set i to type Char::ConvertToUtf32(s, 0)
set final-string(ind:1) to i *> How to perform the conversion to PIC(N)????
add 1 to ind
invoke type Console::Write("{0:X}-", i)
else
exit perform
end-if
end-perform
*> Show the chinese string converted
invoke type System.Windows.Forms.MessageBox::Show(final-string)
display " "
display "Press any key to exit."
accept any-key
goback.
end program Program1.I hope anybody could help me with this problem.
Thank you
I think that I just found out what your original problem was.
You are using PIC N characters but you are not setting the directive:
$set nsymbol"national"
The default for nsymbol is: $set nsymbol"dbcs" which is not unicode.
If you add this directive then the messagebox displays the chinese characters correctly.
[Migrated content. Thread originally posted on 03 November 2011]
Hello,
I am trying to convert a chinese string codified in UTF-8, to Unicode.
In order to achieve that, I am working with a .NET managed code project. Here is my program:
program-id. Program1 as "testcodepage.Program1".
data division.
working-storage section.
* 01 greektext type Byte[] value x"AFACAE9E".
01 chinesetext type Byte[] value x"E58F91E7A5A8".
01 wsEncoding type Encoding.
01 codePageValues type Byte[].
01 unicodeValues string.
01 b type Byte.
01 unicodeString string.
01 enumerator type System.Globalization.TextElementEnumerator.
01 s string.
01 i binary-long.
01 any-key pic x.
01 final-string pic n(2).
01 ind pic 9.
procedure division.
invoke type System.IO.File::WriteAllBytes("chinese.txt", chinesetext)
*> Specify the code page to correctly interpret byte values
* set wsEncoding to type Encoding::GetEncoding(737) *>(DOS) Greek code page
set wsEncoding to type Encoding::UTF8
set codePageValues to type System.IO.File::ReadAllBytes("chinese.txt")
*> Same content is now encoded as UTF-16
set unicodeValues to wsencoding::GetString(codePageValues)
*> Show that the text content is still intact in Unicode string
*> (Add a reference to System.Windows.Forms.dll)
invoke type System.Windows.Forms.MessageBox::Show(unicodeValues)
*> Same content "ψυχή" is stored as UTF-8
invoke type System.IO.File::WriteAllText("chinese_unicode.txt", unicodeValues)
*> Conversion is complete. Show the bytes to prove the conversion.
display "8-bit encoding byte values:"
perform varying b thru codePageValues
invoke type Console::Write("{0:X}-", b)
end-perform
display " "
display "Unicode values:"
set unicodeString to type System.IO.File::ReadAllText("chinese_unicode.txt")
set enumerator to type System.Globalization.StringInfo::GetTextElementEnumerator(unicodeString)
move 1 to ind
perform until exit
if enumerator::MoveNext()
set s to enumerator::GetTextElement()
set i to type Char::ConvertToUtf32(s, 0)
set final-string(ind:1) to i *> How to perform the conversion to PIC(N)????
add 1 to ind
invoke type Console::Write("{0:X}-", i)
else
exit perform
end-if
end-perform
*> Show the chinese string converted
invoke type System.Windows.Forms.MessageBox::Show(final-string)
display " "
display "Press any key to exit."
accept any-key
goback.
end program Program1.I hope anybody could help me with this problem.
Thank you
uuuffff... this will be difficult to explain!
The solution is just move the utf8 string to the TextBox, and I didn't need to perform any conversion, neither use any directive.
As simple as doing this:
move datos-869n(1:ind1) to final-string
set self::txtChinese::Text to final-string
The point was that I thought I need to convert from utf-8 to unicode to display the data. (?_?)
So, my problem is completely solved.
Thanks anyway for your help.