Skip to main content
Question

string behaviour vs pic x with multibyte languages

  • December 5, 2025
  • 2 replies
  • 12 views

Neil Hayes
Forum|alt.badge.img+1

Visual Studio 2022, Visual COBOL 11 Update 1.

 When migrating old native COBOL code to COBOL .NET which has lots of reference modification code you have to sometimes work out something simple as the length of a string.

In OO COBOL you had character arrays and you could get “size” or “sizeinbytes”. Due to my very poor understanding of what a character array was many years ago I got unstuck when using simplified Chinese to do reference modification as getting the “size” means how many simplified Chinese characters are in the array and doing reference modification would truncate the data because I didn’t use “sizeinbytes”.

So now moving to COBOL .NET I’d better check the behaviour of this and see how it works.
I have a simple C# winform 

Thread.CurrentThread.CurrentCulture = new CultureInfo("zh-CN");
Thread.CurrentThread.CurrentUICulture = new CultureInfo("zh-CN");
 

 
This is passed into a C# class and and the RunUnit started.The data is then passed from the class into 2 working storage fields, 1 is a string the other a pic x(200)
 


I can see the representation tin the watch window.
 


But this is where I’m now confused as the hexadecimal representation is very different.
 

 

If I do a perform varying on the pic x(200) from the end looking for the first none space I’m going to get a very different number from getting a GetByteCount on the string.

           declare u8 as type Encoding = type Encoding::UTF8
           declare iBC as binary-long = u8::GetByteCount(str-data-check)
           set ws-myclassA::c_sharp_size to iBC


Can someone explain this to me in simple terms please.

 

 

2 replies

Gael Wilson
Forum|alt.badge.img
  • Rocketeer
  • December 5, 2025

Neil, 

A string in .Net is not UTF8 so using your u8 encoding to get the number of bytes for the Chinese encoding is wrong. You will need to get the encoding for zh-CN and pass the string to the GetByteCount method for that encoding to get the correct value.

Gael


Gael Wilson
Forum|alt.badge.img
  • Rocketeer
  • December 5, 2025

Sorry, I jumped in a bit too quickly. What you have done is get the number of bytes for a UTF8 encoding but the move from the string to the pic x(200) won’t be UTF8 as far as I am aware, I think it is  done using the current encoding. If it’s UTF-8 encoding you want there are directives ie SOURCE-ENCODING RUNTIME-ENCODING and intrinsic functions for UTF-8. 

If it’s actually the Chinese encoding you’re after then my previous comment about that and it’s GetByteCount still apply.