string behaviour vs pic x with multibyte languages

Question

Visual Studio 2022, Visual COBOL 11 Update 1.

When migrating old native COBOL code to COBOL .NET which has lots of reference modification code you have to sometimes work out something simple as the length of a string.

In OO COBOL you had character arrays and you could get “size” or “sizeinbytes”. Due to my very poor understanding of what a character array was many years ago I got unstuck when using simplified Chinese to do reference modification as getting the “size” means how many simplified Chinese characters are in the array and doing reference modification would truncate the data because I didn’t use “sizeinbytes”.

So now moving to COBOL .NET I’d better check the behaviour of this and see how it works.
I have a simple C# winform

Thread.CurrentThread.CurrentCulture = new CultureInfo("zh-CN");
Thread.CurrentThread.CurrentUICulture = new CultureInfo("zh-CN");

This is passed into a C# class and and the RunUnit started.The data is then passed from the class into 2 working storage fields, 1 is a string the other a pic x(200)

I can see the representation tin the watch window.

But this is where I’m now confused as the hexadecimal representation is very different.

If I do a perform varying on the pic x(200) from the end looking for the first none space I’m going to get a very different number from getting a GetByteCount on the string.

declare u8 as type Encoding = type Encoding::UTF8
declare iBC as binary-long = u8::GetByteCount(str-data-check)
set ws-myclassA::c_sharp_size to iBC

Can someone explain this to me in simple terms please.

Gael Wilson · Answer

Neil,A string in .Net is not UTF8 so using your u8 encoding to get the number of bytes for the Chinese encoding is wrong. You will need to get the encoding for zh-CN and pass the string to the GetByteCount method for that encoding to get the correct value.Gael