Skip to main content
Question

Left shift of fields in file when the previous field contains a utf8 character

  • February 3, 2026
  • 11 replies
  • 75 views

M Carmen De Paz
Forum|alt.badge.img+2

Hi ,we works with Micro Focus Visual cobol 8.0 for Eclipse on linux/Unix

We have a program  with a file that we defines so:

file-control.

     select prevcob    assign to wk-prevcob

     organization is line sequential.

data division.

file section.

fd prevcob.

01  reg-prevcob               pic x(185).

and a working variable that defines the file structure as follows:

01 wk-reg-prevcob-pru.

    10 prevcob-nombre-pru            pic x(40)      value spaces.

    10 prevcob-impdivisa-pru         pic 9(16)v99   value 0.

    10 resto                         pic x(127).

We define a cusor that obtains data from a database with UTF-8 locales in Informix and passes it to the variables of the working (the host variable are defined like pic x(300) and pic s9(16)v9(2))

move clinombr                    to prevcob-nombre-pru

move srvimpdivi                  to prevcob-impdivisa-pru

 we write the file as follws:

write reg-prevcob from wk-reg-prevcob-pru

The problem is that when field prevcob-nombre-pru contains a utf8 character, field prevcob-impdivisa-pru  appears to  shift one position to the left and the file becomes misconfigured.this is a image of dthe file edited with VI editor

The file is in a directory of the server unix.

 

¿is it possible that the file appears alingned?

 

 

 

11 replies

Chris Glazier
Forum|alt.badge.img+3

Line Sequential files can contain only printable ASCII characters unless you use the configuration option INSERTNULL. It may also be a problem with how vi interprets the character. Can you please attach a small file that demonstrates the problem so that I may review it?


M Carmen De Paz
Forum|alt.badge.img+2
  • Author
  • Participating Frequently
  • February 4, 2026

Hi Chris, It's great to talk to you again. I hope you can help me.

I think the problem is that bytes are being written, not characters. For example, the byte representation of the letter Ñ is c391 (2 bytes), so 40 bytes translate to 39 printable characters.

I've run an `od` (octal dump) on the file, and there are indeed 40 bytes in the field containing the Ñ.

Is there any way to maintain the 40 printable characters even if it contains UTF-8 characters?

I attach the file , we genetate the file without extension , but I had to add an extension because otherwise it wouldn't let me attach it


Chris Glazier
Forum|alt.badge.img+3

The UTF-8 representation of the character is always going to be 2 bytes in the file as that is how it is represented in the UTF-8 character set.

If you want it to be stored as one byte you would have to convert it to ASCII/ANSI before writing it.

 

No guarantees but here is an example that works in your case. It will convert a string of UTF-8 characters including yours to ANSI and then write the file.

       identification division.
program-id. Program1.

environment division.
configuration section.
select test-file assign to "testfile.dat"
organization is line sequential
file status is file-status.
data division.
fd test-file.
01 test-rec pic x(15).
working-storage section.
01 file-status pic x(2).
01 my-char pic x(15) value spaces.
01 my-utf pic x(15) value x"C391C391C391C3913132".
01 out-length pic x(4) comp-x value 15.
01 reserved pic x(4) comp-x value 0.
01 status-code pic x(4) comp-5.
procedure division.


call "CBL_STRING_CONVERT" using by reference my-utf
by value 15
by value 0
by reference my-char
by reference out-length
by value 3
by value 0
by reference reserved
returning status-code.
open output test-file
move my-char to test-rec
write test-rec
display file-status
close test-file
open input test-file
read test-file
close test-file

goback.

end program Program1.

 


M Carmen De Paz
Forum|alt.badge.img+2
  • Author
  • Participating Frequently
  • February 4, 2026
Thank you so much for your help Chris, but unfortunately it doesn't work for what I need.

I wrote this porgram :

identification division.
       program-id. anook7.
       author. cdepaz.
       date-written. 04.02.2026.
      ******************************************************************
      * progrma de prueba fichero con caracteres UTF8
      ******************************************************************
      ******************************************************************
       environment division.
      ******************************************************************
      ******************************************************************
       configuration section.
      ******************************************************************
       special-names.
          decimal-point is comma.

       input-output section.
      ******************************************************************
       file-control.
       select test-file assign to "testfile.dat"
                       organization is line sequential
                       file status is file-status.
      ******************************************************************
       data division.
       file section.
       fd test-file.
       01 test-rec.
          05 test-rec-char pic x(15).
          05 test-rec-num  pic 9(05).

      ******************************************************************
       working-storage section.
      ******************************************************************
       01 sw-fin                       pic 9 value zero.
          88 si-fin                        value 1.
          88 no-fin                        value zero.

       01 file-status   pic x(2).
       01 my-char       pic x(15)  value spaces.
       01 my-utf        pic x(15)  value x"C391C391C391C3913132".
       01 out-length    pic x(4) comp-x value 15.
       01 reserved      pic x(4) comp-x value 0.
       01 status-code   pic x(4) comp-5.

      *-----------------------------------------------------------------
      ******************************************************************
       linkage section.
      *-----------------------------------------------------------------
       procedure division.
      ******************************************************************
      *-----------------------------------------------------------------
           call "CBL_STRING_CONVERT" using by reference my-utf
                                by value 15
                                by value 0
                                by reference my-char
                                by reference out-length
                                by value 3
                                by value 0
                                by reference reserved
                                returning status-code.
           open output test-file
           move 'ABCDEFG' to test-rec-char
           move 0       to test-rec-num
           write test-rec

           move my-char to test-rec-char
           move 0       to test-rec-num
           write test-rec
           close test-file

           open input test-file
           read test-file at end set si-fin to true
           end-read
           perform until si-fin
              display test-rec
              read test-file at end set si-fin to true
           end-perform
           close test-file
           .
       fin.
           goback.
 

end the resul is :

 

There is a left shift for each utf8 character

M Carmen De Paz
Forum|alt.badge.img+2
  • Author
  • Participating Frequently
  • February 4, 2026

the problem is that in our system the environment variable is LANG=es_ES.UTF-8 and the out-encoding 3 ->

ASCII/MBCS characters (current locale) It depends on the locale, and since it's utf8 the function does nothing


Chris Glazier
Forum|alt.badge.img+3

Is the problem with storing the accented UTF-8 characters in the file or in how they are displayed on a terminal? Is it ok to store the 2 bytes in the file as long as it displays correctly or do you want the extended ASCII equivalent stored in the file as one character?

I have found if there is some type of field terminator (“,” or tab?) between the columns then I can get it to display correctly using the column command like:

column -t -s $',' testfile.dat

This is when I define the file as:
fd test-file.
01 test-rec.
      05 test-rec-char  pic x(15).
      05 delim          pic x.    *> insert a comma
      05 test-rec-num   pic 9(5).
 

If I know more of what you actually plan to do with these characters, I might be able to be more helpful.

Thanks


M Carmen De Paz
Forum|alt.badge.img+2
  • Author
  • Participating Frequently
  • February 5, 2026

The problem is in the file,  and  i would like store  the extended ASCII equivalent in the file.

We need to generate a file with a fixed data structure to pass to another software provider. The data is expected in specific positions, which is why it's so important that there are no offsets. What I don't know is the encoding the third party uses, although I imagine it's not UTF-8; it's possible it's ISO-8895-1.

 

I'm now trying to use the C function iconv in a similar way to how we sometimes use it in the operating system for conversion, but I'm getting some runtime errors.


M Carmen De Paz
Forum|alt.badge.img+2
  • Author
  • Participating Frequently
  • February 5, 2026

I've been studying the cobutf8 utility I found in the Micro Focus documentation, which is based precisely on iconv. Do you think it could be useful to me? how can i use it?

 


Chris Glazier
Forum|alt.badge.img+3

I don’t think the cobutf8 utility will help but I am not an expert in it.

 

I was able to get the “iconv” function call API to work with COBOL though.

Here is the new example:

 

      $set sourceformat"variable"
identification division.
program-id. Program1.

environment division.
configuration section.
select test-file assign to "testfile.dat"
organization is line sequential
file status is file-status.
data division.
fd test-file.
01 test-rec.
05 test-rec-char pic x(15).
05 test-rec-num pic 9(5).
working-storage section.
01 file-status pic x(2).
01 my-char pic x(20) value spaces.
01 my-utf pic x(15) value x"C391C391C391C39131322020202020".
01 in-len pic x(8) comp-5 value 15.
01 out-len pic x(8) comp-5 value 20.
01 capacity pic x(8) comp-5 value zeroes.
01 status-code pic x(8) comp-5.
01 cdesc pointer.
01 to-charset pic x(13) value z"WINDOWS-1252".
01 from-charset pic x(6) value z"UTF-8".
01 utf-point pointer.
01 char-point pointer.
procedure division.

set utf-point to address of my-utf
set char-point to address of my-char
move length of my-utf to in-len
move length of my-char to out-len capacity

call "iconv_open" using to-charset, from-charset
returning cdesc

if cdesc = null
display "error on iconv_open"
stop run
end-if

call "iconv" using by value cdesc
by reference utf-point
by reference in-len
by reference char-point
by reference out-len
returning status-code
if status-code = -1
display "error on iconv"
stop run
end-if

call "iconv_close" using by value cdesc

open output test-file
move "ABCDEFG" to test-rec-char
move zeroes to test-rec-num
write test-rec
display file-status
move spaces to test-rec-char
move my-char(1:capacity - out-len) to test-rec-char
write test-rec
close test-file

open input test-file
perform until exit
read test-file
at end
exit perform
not at end
display "read ok"
end-read
end-perform

close test-file

goback.

 


M Carmen De Paz
Forum|alt.badge.img+2
  • Author
  • Participating Frequently
  • February 9, 2026

Thank you so much for your reply. I've implemented it and it works perfectly, but I'd like to delve deeper into the cobutf8 utility because I think it suits our needs. Is there anyone who knows it and could help me?

Or would it be better to start a new discussion?

Chris Glazier
Forum|alt.badge.img+3

I am happy that the iconv program worked for you!

Please start a new discussion, so that it doesn’t get lost in this one, and specify exactly what problem you are trying to solve using the cobutf8 utility.

Thanks