Dealing with UTF-16 LE databases.

+2

Chris Glazier
Moderator
Forum|Forum|12 years ago
January 9, 2013

Hi all,

I'm using Net Express 5.1 on Windows Server 2008.

My job is to convert databases to get them ready for production. What do is when I have Gendercode, Firstname and Lastname in seperate fields, I convert them into full names and salutations. For example: "M;Brian;Jones" becomes "Mr. Brian Jones" and "Dear mr Jones,".

It all goes well as long as the databases are 8 bit ANSI.

The problem i sumbled upon is this: A customer delivers Eastern European names including there high characters in an Excel database. As soon as i export the database into CSV-ANSI those characters are lost, because they need 16 bits in stead of the standard 8 bits. I managed to export then as UTF-16 LE. I already understand that it is LE (Little Endian), because the BOM (Byte Order Mark) says x"FF FE".

I've read that Net Express can handel 16-bit LE databases quite well. Problem is that i'm a total newbe on this. So my question is: How different do I have to set up the CBL programm? Where do i start and is there an example CBL i can sneak into?

Kind Regards,

ZelosBV

Net Express uses PIC N fields of USAGE NATIONAL to provide support for UTF-16 characters.

If you set the directive NSYMBOL"NATIONAL" then you can use the PIC N without the USAGE NATIONAL clause.

In the Net Express help documentation look under Programming-->Internationalization Support-->Unicode Support.

There are also intrinsic functions for converting between ANSI and Unicode character strings.

In the Net Express help documentation look under Reference-->COBOL Language-->Program Definition-->Procedure Division-->Intrinsic Functions--> DISPLAY-OF and NATIONAL-OF functions.

Thanks.

Like

D

Dominique Sacre
Author
Rocketeer
Forum|Forum|12 years ago
January 9, 2013

Hi all,

I'm using Net Express 5.1 on Windows Server 2008.

My job is to convert databases to get them ready for production. What do is when I have Gendercode, Firstname and Lastname in seperate fields, I convert them into full names and salutations. For example: "M;Brian;Jones" becomes "Mr. Brian Jones" and "Dear mr Jones,".

It all goes well as long as the databases are 8 bit ANSI.

The problem i sumbled upon is this: A customer delivers Eastern European names including there high characters in an Excel database. As soon as i export the database into CSV-ANSI those characters are lost, because they need 16 bits in stead of the standard 8 bits. I managed to export then as UTF-16 LE. I already understand that it is LE (Little Endian), because the BOM (Byte Order Mark) says x"FF FE".

I've read that Net Express can handel 16-bit LE databases quite well. Problem is that i'm a total newbe on this. So my question is: How different do I have to set up the CBL programm? Where do i start and is there an example CBL i can sneak into?

Kind Regards,

ZelosBV

Hi Chris,

Thank you for the reply. I presume I have to put '$set NSYMBOL "NATIONAL" ' above the identification division? And NATIONAL will support the UTF-16 LE database?

About the intrinsic Functions: I came as far as Reference-->COBOL Language-->Program Definition-->Procedure Division-->Intrinsic Functions. Nothing about the DISPLAY-OF or NATIONAL-OF functions.

Thanks.

Like

+2

Chris Glazier
Moderator
Forum|Forum|12 years ago
January 9, 2013

Hi all,

I'm using Net Express 5.1 on Windows Server 2008.

My job is to convert databases to get them ready for production. What do is when I have Gendercode, Firstname and Lastname in seperate fields, I convert them into full names and salutations. For example: "M;Brian;Jones" becomes "Mr. Brian Jones" and "Dear mr Jones,".

It all goes well as long as the databases are 8 bit ANSI.

The problem i sumbled upon is this: A customer delivers Eastern European names including there high characters in an Excel database. As soon as i export the database into CSV-ANSI those characters are lost, because they need 16 bits in stead of the standard 8 bits. I managed to export then as UTF-16 LE. I already understand that it is LE (Little Endian), because the BOM (Byte Order Mark) says x"FF FE".

I've read that Net Express can handel 16-bit LE databases quite well. Problem is that i'm a total newbe on this. So my question is: How different do I have to set up the CBL programm? Where do i start and is there an example CBL i can sneak into?

Kind Regards,

ZelosBV

You can put the NSYMBOL directive in a set statement above the id division or within the project properties or a directives file it doesn't really matter.

By default the UNICODE directive is set to UNICODE"NATIVE" which means that bytes are stored in the native order of the OS under which you are running which for Windows is Little Endian.

I am not sure what you mean by UTF-16 LE database though.

There is noting in the PIC N which is tied to database functionality other than you can use the PIC N fields to hold UTF-16 strings and these can be used with a database.

For the Intrinsic functions docs I missed the last segment which is Definition of Functions.

Under this you will find a list that includes DISPLAY-OF and the NATIONAL-OF functions.

Thanks.

Like

D

Dominique Sacre
Author
Rocketeer
Forum|Forum|12 years ago
January 9, 2013

Hi all,

I'm using Net Express 5.1 on Windows Server 2008.

My job is to convert databases to get them ready for production. What do is when I have Gendercode, Firstname and Lastname in seperate fields, I convert them into full names and salutations. For example: "M;Brian;Jones" becomes "Mr. Brian Jones" and "Dear mr Jones,".

It all goes well as long as the databases are 8 bit ANSI.

The problem i sumbled upon is this: A customer delivers Eastern European names including there high characters in an Excel database. As soon as i export the database into CSV-ANSI those characters are lost, because they need 16 bits in stead of the standard 8 bits. I managed to export then as UTF-16 LE. I already understand that it is LE (Little Endian), because the BOM (Byte Order Mark) says x"FF FE".

I've read that Net Express can handel 16-bit LE databases quite well. Problem is that i'm a total newbe on this. So my question is: How different do I have to set up the CBL programm? Where do i start and is there an example CBL i can sneak into?

Kind Regards,

ZelosBV

LE means that a character space for example is stored as x"20 00" which is NATIVE if i undestand it correctly. So NATIONAL would be not the right setting.

Thank you

Like

D

Dominique Sacre
Author
Rocketeer
Forum|Forum|12 years ago
January 9, 2013

Hi all,

I'm using Net Express 5.1 on Windows Server 2008.

My job is to convert databases to get them ready for production. What do is when I have Gendercode, Firstname and Lastname in seperate fields, I convert them into full names and salutations. For example: "M;Brian;Jones" becomes "Mr. Brian Jones" and "Dear mr Jones,".

It all goes well as long as the databases are 8 bit ANSI.

The problem i sumbled upon is this: A customer delivers Eastern European names including there high characters in an Excel database. As soon as i export the database into CSV-ANSI those characters are lost, because they need 16 bits in stead of the standard 8 bits. I managed to export then as UTF-16 LE. I already understand that it is LE (Little Endian), because the BOM (Byte Order Mark) says x"FF FE".

I've read that Net Express can handel 16-bit LE databases quite well. Problem is that i'm a total newbe on this. So my question is: How different do I have to set up the CBL programm? Where do i start and is there an example CBL i can sneak into?

Kind Regards,

ZelosBV

LE means that a character space for example is stored as x"20 00" which is NATIVE if i undestand it correctly. So NATIONAL would be not the right setting.

Thank you

Like

+2

Chris Glazier
Moderator
Forum|Forum|12 years ago
January 9, 2013

Hi all,

I'm using Net Express 5.1 on Windows Server 2008.

My job is to convert databases to get them ready for production. What do is when I have Gendercode, Firstname and Lastname in seperate fields, I convert them into full names and salutations. For example: "M;Brian;Jones" becomes "Mr. Brian Jones" and "Dear mr Jones,".

It all goes well as long as the databases are 8 bit ANSI.

The problem i sumbled upon is this: A customer delivers Eastern European names including there high characters in an Excel database. As soon as i export the database into CSV-ANSI those characters are lost, because they need 16 bits in stead of the standard 8 bits. I managed to export then as UTF-16 LE. I already understand that it is LE (Little Endian), because the BOM (Byte Order Mark) says x"FF FE".

I've read that Net Express can handel 16-bit LE databases quite well. Problem is that i'm a total newbe on this. So my question is: How different do I have to set up the CBL programm? Where do i start and is there an example CBL i can sneak into?

Kind Regards,

ZelosBV

Correct so under Windows you do not need to set the UNICODE directive because native is the default.

Like

D

Dominique Sacre
Author
Rocketeer
Forum|Forum|12 years ago
January 9, 2013

Hi all,

I'm using Net Express 5.1 on Windows Server 2008.

My job is to convert databases to get them ready for production. What do is when I have Gendercode, Firstname and Lastname in seperate fields, I convert them into full names and salutations. For example: "M;Brian;Jones" becomes "Mr. Brian Jones" and "Dear mr Jones,".

It all goes well as long as the databases are 8 bit ANSI.

The problem i sumbled upon is this: A customer delivers Eastern European names including there high characters in an Excel database. As soon as i export the database into CSV-ANSI those characters are lost, because they need 16 bits in stead of the standard 8 bits. I managed to export then as UTF-16 LE. I already understand that it is LE (Little Endian), because the BOM (Byte Order Mark) says x"FF FE".

I've read that Net Express can handel 16-bit LE databases quite well. Problem is that i'm a total newbe on this. So my question is: How different do I have to set up the CBL programm? Where do i start and is there an example CBL i can sneak into?

Kind Regards,

ZelosBV

Than you for the reply Chris. I'll try it and get back to you.

Like

D

Dominique Sacre
Author
Rocketeer
Forum|Forum|12 years ago
January 9, 2013

Hi all,

I'm using Net Express 5.1 on Windows Server 2008.

My job is to convert databases to get them ready for production. What do is when I have Gendercode, Firstname and Lastname in seperate fields, I convert them into full names and salutations. For example: "M;Brian;Jones" becomes "Mr. Brian Jones" and "Dear mr Jones,".

It all goes well as long as the databases are 8 bit ANSI.

The problem i sumbled upon is this: A customer delivers Eastern European names including there high characters in an Excel database. As soon as i export the database into CSV-ANSI those characters are lost, because they need 16 bits in stead of the standard 8 bits. I managed to export then as UTF-16 LE. I already understand that it is LE (Little Endian), because the BOM (Byte Order Mark) says x"FF FE".

I've read that Net Express can handel 16-bit LE databases quite well. Problem is that i'm a total newbe on this. So my question is: How different do I have to set up the CBL programm? Where do i start and is there an example CBL i can sneak into?

Kind Regards,

ZelosBV

Hi Chris,

I gues there aren't a substitute for the functions UNSTRING and MOVE?

Like

+2

Chris Glazier
Moderator
Forum|Forum|12 years ago
January 9, 2013

Hi all,

I'm using Net Express 5.1 on Windows Server 2008.

My job is to convert databases to get them ready for production. What do is when I have Gendercode, Firstname and Lastname in seperate fields, I convert them into full names and salutations. For example: "M;Brian;Jones" becomes "Mr. Brian Jones" and "Dear mr Jones,".

It all goes well as long as the databases are 8 bit ANSI.

The problem i sumbled upon is this: A customer delivers Eastern European names including there high characters in an Excel database. As soon as i export the database into CSV-ANSI those characters are lost, because they need 16 bits in stead of the standard 8 bits. I managed to export then as UTF-16 LE. I already understand that it is LE (Little Endian), because the BOM (Byte Order Mark) says x"FF FE".

I've read that Net Express can handel 16-bit LE databases quite well. Problem is that i'm a total newbe on this. So my question is: How different do I have to set up the CBL programm? Where do i start and is there an example CBL i can sneak into?

Kind Regards,

ZelosBV

Both MOVE and UNSTRING will work with utf-data but the sending and receiving data types must match.

Example:

$set nsymbol"national"
      id division.
      program-id. testutf.
      working-storage section.
      01 utf-field pic N(10) value N"TEST FIELD".
      01 utf-field2 pic N(10) value spaces.
      01 picx-field pic x(10) value spaces.
      procedure division.
         move utf-field to utf-field2
         unstring utf-field delimited by N" " into utf-field2
         move function display-of(utf-field) to picx-field
         display picx-field
         move function national-of(picx-field) to utf-field
         display utf-field
         stop run.

Like

D

Dominique Sacre
Author
Rocketeer
Forum|Forum|12 years ago
January 10, 2013

Hi all,

I'm using Net Express 5.1 on Windows Server 2008.

My job is to convert databases to get them ready for production. What do is when I have Gendercode, Firstname and Lastname in seperate fields, I convert them into full names and salutations. For example: "M;Brian;Jones" becomes "Mr. Brian Jones" and "Dear mr Jones,".

It all goes well as long as the databases are 8 bit ANSI.

The problem i sumbled upon is this: A customer delivers Eastern European names including there high characters in an Excel database. As soon as i export the database into CSV-ANSI those characters are lost, because they need 16 bits in stead of the standard 8 bits. I managed to export then as UTF-16 LE. I already understand that it is LE (Little Endian), because the BOM (Byte Order Mark) says x"FF FE".

I've read that Net Express can handel 16-bit LE databases quite well. Problem is that i'm a total newbe on this. So my question is: How different do I have to set up the CBL programm? Where do i start and is there an example CBL i can sneak into?

Kind Regards,

ZelosBV

I see you've set NSYMBOL to NATIONAL. Is there a reason for that? I tried this with variable NATIVE data and it doesn't work.

Like

D

Dominique Sacre
Author
Rocketeer
Forum|Forum|12 years ago
January 10, 2013

Hi all,

I'm using Net Express 5.1 on Windows Server 2008.

My job is to convert databases to get them ready for production. What do is when I have Gendercode, Firstname and Lastname in seperate fields, I convert them into full names and salutations. For example: "M;Brian;Jones" becomes "Mr. Brian Jones" and "Dear mr Jones,".

It all goes well as long as the databases are 8 bit ANSI.

The problem i sumbled upon is this: A customer delivers Eastern European names including there high characters in an Excel database. As soon as i export the database into CSV-ANSI those characters are lost, because they need 16 bits in stead of the standard 8 bits. I managed to export then as UTF-16 LE. I already understand that it is LE (Little Endian), because the BOM (Byte Order Mark) says x"FF FE".

I've read that Net Express can handel 16-bit LE databases quite well. Problem is that i'm a total newbe on this. So my question is: How different do I have to set up the CBL programm? Where do i start and is there an example CBL i can sneak into?

Kind Regards,

ZelosBV

I see you've set NSYMBOL to NATIONAL. Is there a reason for that? I tried this with variable NATIVE data and it doesn't work.

Like

D

Dominique Sacre
Author
Rocketeer
Forum|Forum|12 years ago
January 10, 2013

Hi all,

I'm using Net Express 5.1 on Windows Server 2008.

My job is to convert databases to get them ready for production. What do is when I have Gendercode, Firstname and Lastname in seperate fields, I convert them into full names and salutations. For example: "M;Brian;Jones" becomes "Mr. Brian Jones" and "Dear mr Jones,".

It all goes well as long as the databases are 8 bit ANSI.

The problem i sumbled upon is this: A customer delivers Eastern European names including there high characters in an Excel database. As soon as i export the database into CSV-ANSI those characters are lost, because they need 16 bits in stead of the standard 8 bits. I managed to export then as UTF-16 LE. I already understand that it is LE (Little Endian), because the BOM (Byte Order Mark) says x"FF FE".

I've read that Net Express can handel 16-bit LE databases quite well. Problem is that i'm a total newbe on this. So my question is: How different do I have to set up the CBL programm? Where do i start and is there an example CBL i can sneak into?

Kind Regards,

ZelosBV

What I mean is that I HAVE to work with NATIVE UTF-16. So correct me if i'm wrong but I think the function "NATIONAL-OF" won't work with NATIVE fields.

Like

+2

Chris Glazier
Moderator
Forum|Forum|12 years ago
January 10, 2013

Hi all,

I'm using Net Express 5.1 on Windows Server 2008.

My job is to convert databases to get them ready for production. What do is when I have Gendercode, Firstname and Lastname in seperate fields, I convert them into full names and salutations. For example: "M;Brian;Jones" becomes "Mr. Brian Jones" and "Dear mr Jones,".

It all goes well as long as the databases are 8 bit ANSI.

The problem i sumbled upon is this: A customer delivers Eastern European names including there high characters in an Excel database. As soon as i export the database into CSV-ANSI those characters are lost, because they need 16 bits in stead of the standard 8 bits. I managed to export then as UTF-16 LE. I already understand that it is LE (Little Endian), because the BOM (Byte Order Mark) says x"FF FE".

I've read that Net Express can handel 16-bit LE databases quite well. Problem is that i'm a total newbe on this. So my question is: How different do I have to set up the CBL programm? Where do i start and is there an example CBL i can sneak into?

Kind Regards,

ZelosBV

I think that you may be confused between two different directives that I have introduced here.

NSYMBOL"NATIONAL" tells the compiler to treat PIC N fields as though USAGE NATIONAL has been specified. USAGE NATIONAL means Unicode. If you do not set NSYMBOL"NATIONAL" then you would have to explicitly use the USAGE NATIONAL clause on your PIC N data items.

The default for the NSYMBOL directive is DBCS which tells the compiler to treat PIC N as though USAGE DISPLAY-1 has been specified. USAGE DISPLAY-1 refers to non-Unicode double byte character set (Kanji).

The other directive is the UNICODE directive.

UNICODE"NATIVE" tells the compiler that the Unicode byte ordering will be in the native byte order used on the computer. On Intel based Windows this is little-endian. This is the default setting so you do not have to change it.

The alternative to UNICODE"NATIVE" is UNICODE"PORTABLE" which means that the byte ordering will always be in big-endian order regardless of the ordering actually used on the computer.

See the Net Express Help for NSYMBOL and UNICODE directives.

Like

D

Dominique Sacre
Author
Rocketeer
Forum|Forum|12 years ago
January 10, 2013

Hi all,

I'm using Net Express 5.1 on Windows Server 2008.

My job is to convert databases to get them ready for production. What do is when I have Gendercode, Firstname and Lastname in seperate fields, I convert them into full names and salutations. For example: "M;Brian;Jones" becomes "Mr. Brian Jones" and "Dear mr Jones,".

It all goes well as long as the databases are 8 bit ANSI.

The problem i sumbled upon is this: A customer delivers Eastern European names including there high characters in an Excel database. As soon as i export the database into CSV-ANSI those characters are lost, because they need 16 bits in stead of the standard 8 bits. I managed to export then as UTF-16 LE. I already understand that it is LE (Little Endian), because the BOM (Byte Order Mark) says x"FF FE".

I've read that Net Express can handel 16-bit LE databases quite well. Problem is that i'm a total newbe on this. So my question is: How different do I have to set up the CBL programm? Where do i start and is there an example CBL i can sneak into?

Kind Regards,

ZelosBV

Hi Chris,

Sorry if I'm not clear on this. Like I said, I'm a newbe on this. Is it OK if I sent you a Zip-file with data to explain what I realy want?

Thanx

Like

+2

Chris Glazier
Moderator
Forum|Forum|12 years ago
January 10, 2013

Hi all,

I'm using Net Express 5.1 on Windows Server 2008.

My job is to convert databases to get them ready for production. What do is when I have Gendercode, Firstname and Lastname in seperate fields, I convert them into full names and salutations. For example: "M;Brian;Jones" becomes "Mr. Brian Jones" and "Dear mr Jones,".

It all goes well as long as the databases are 8 bit ANSI.

The problem i sumbled upon is this: A customer delivers Eastern European names including there high characters in an Excel database. As soon as i export the database into CSV-ANSI those characters are lost, because they need 16 bits in stead of the standard 8 bits. I managed to export then as UTF-16 LE. I already understand that it is LE (Little Endian), because the BOM (Byte Order Mark) says x"FF FE".

I've read that Net Express can handel 16-bit LE databases quite well. Problem is that i'm a total newbe on this. So my question is: How different do I have to set up the CBL programm? Where do i start and is there an example CBL i can sneak into?

Kind Regards,

ZelosBV

Please send file to chris.glazier@microfocus.com and I will take a look at it.

Like

D

Dominique Sacre
Author
Rocketeer
Forum|Forum|12 years ago
January 10, 2013

Hi all,

I'm using Net Express 5.1 on Windows Server 2008.

My job is to convert databases to get them ready for production. What do is when I have Gendercode, Firstname and Lastname in seperate fields, I convert them into full names and salutations. For example: "M;Brian;Jones" becomes "Mr. Brian Jones" and "Dear mr Jones,".

It all goes well as long as the databases are 8 bit ANSI.

The problem i sumbled upon is this: A customer delivers Eastern European names including there high characters in an Excel database. As soon as i export the database into CSV-ANSI those characters are lost, because they need 16 bits in stead of the standard 8 bits. I managed to export then as UTF-16 LE. I already understand that it is LE (Little Endian), because the BOM (Byte Order Mark) says x"FF FE".

I've read that Net Express can handel 16-bit LE databases quite well. Problem is that i'm a total newbe on this. So my question is: How different do I have to set up the CBL programm? Where do i start and is there an example CBL i can sneak into?

Kind Regards,

ZelosBV

OK. Thank you.

Like

D

Dominique Sacre
Author
Rocketeer
Forum|Forum|12 years ago
January 10, 2013

Hi all,

I'm using Net Express 5.1 on Windows Server 2008.

My job is to convert databases to get them ready for production. What do is when I have Gendercode, Firstname and Lastname in seperate fields, I convert them into full names and salutations. For example: "M;Brian;Jones" becomes "Mr. Brian Jones" and "Dear mr Jones,".

It all goes well as long as the databases are 8 bit ANSI.

The problem i sumbled upon is this: A customer delivers Eastern European names including there high characters in an Excel database. As soon as i export the database into CSV-ANSI those characters are lost, because they need 16 bits in stead of the standard 8 bits. I managed to export then as UTF-16 LE. I already understand that it is LE (Little Endian), because the BOM (Byte Order Mark) says x"FF FE".

I've read that Net Express can handel 16-bit LE databases quite well. Problem is that i'm a total newbe on this. So my question is: How different do I have to set up the CBL programm? Where do i start and is there an example CBL i can sneak into?

Kind Regards,

ZelosBV

OK. Thank you.

Like

+2

Chris Glazier
Moderator
Forum|Forum|12 years ago
January 11, 2013

Hi all,

I'm using Net Express 5.1 on Windows Server 2008.

My job is to convert databases to get them ready for production. What do is when I have Gendercode, Firstname and Lastname in seperate fields, I convert them into full names and salutations. For example: "M;Brian;Jones" becomes "Mr. Brian Jones" and "Dear mr Jones,".

It all goes well as long as the databases are 8 bit ANSI.

The problem i sumbled upon is this: A customer delivers Eastern European names including there high characters in an Excel database. As soon as i export the database into CSV-ANSI those characters are lost, because they need 16 bits in stead of the standard 8 bits. I managed to export then as UTF-16 LE. I already understand that it is LE (Little Endian), because the BOM (Byte Order Mark) says x"FF FE".

I've read that Net Express can handel 16-bit LE databases quite well. Problem is that i'm a total newbe on this. So my question is: How different do I have to set up the CBL programm? Where do i start and is there an example CBL i can sneak into?

Kind Regards,

ZelosBV

What ZelosVB is trying to do is to read and process a text file which is stored in UTF-16 LE format.

The fields in each record are delimited by a UTF-16 tab character and records are delimited by UTF-16 CRLF characters.

Net Express does not support this type of file directly but I wrote a sample program that will read it using the CBL byte stream file routines and will parse the individual data records from the file and then further parse each record into its tab delimited fields placing the fields in a PIC N field for processing as UTF-16 data.

I have attached the example project and made the source code available here:

*------------------------------------------------------------------------------------------*
*                      CONVERTUTF16
*
* This example demonstrates how to read in a UTF-16 based text
* file which has records whose fields are delimited by tabs.
* There is no filetype in Net Express that is equivalent to a
* Unicode file so we must read the file using the CBL_ byte
* stream library routines.
*
* In this demo the entire file is read into a buffer and then
* the individual records are parsed and for each record the indiv-
* idual fields are parsed and processed.
*
* Although the file is in UTF-16 the parsing is all done using
* standard PIC X data fields instead of PIC N for ease of use.
*-------------------------------------------------------------------------------------------*
id division.
program-id. convertutf16.
data division.
working-storage section.
01 filename     pic x(256)     value "UTF16-LE_TAB-delimited.txt".
01 access-mode pic x    comp-x value 1.
01 deny-mode    pic x    comp-x value 0.
01 device       pic x    comp-x value 0.
01 file-handle pic x(4) comp-5 value 0.
01 file-offset pic x(8) comp-x value 0.
01 byte-count   pic x(4) comp-x value 0.
01 flags        pic x    comp-x value 0.
01 buffer       pic X(100000)   value spaces.
01 rec-size     pic 9(5)        value zeroes.
01 start-pos    pic 9(5)        value 0.
01 end-pos      pic 9(5)        value 0.
01 rec-num      pic 9(5)        value 0.
01 current-record pic X(1000)   value spaces.
01 current-field pic x(100)     value spaces.
01 current-field-n pic n(100)   value spaces.
01 field-pointer pic 9(5)       value zeroes.
01 field-size    pic 9(5)       value zeroes.
01 crlf          pic x(4)       value X"0D000A00".
01 tab-char     pic x(2)        value X"0900".
01 last-record-flag pic x       value "N".
    88 last-record               value "Y"
           when set to false               "N".
procedure division.

     call "CBL_OPEN_FILE"
        using filename
              access-mode
              deny-mode
              device
              file-handle
     end-call
     if return-code not = 0
        display "error on open = " return-code
        stop run
     end-if
*> The following call with flags set to 128 will get the filesize
*> into the file-offset field. It is then used in the next read
*> to read the entire file at once. You must ensure that the
*> buffer size is large enough to hold your largest file.
     move 128 to flags
*> get filesize
     call "CBL_READ_FILE"
        using file-handle
              file-offset
              byte-count
              flags
              buffer
     end-call

     if return-code not = 0
        display "error on read rec length = " return-code
        stop run
     end-if

move file-offset to byte-count
move 0 to file-offset flags

     call "CBL_READ_FILE"
        using file-handle
              file-offset
              byte-count
              flags
              buffer
     end-call

     if return-code not = 0
        display "error on read = " return-code
        stop run
     end-if

*> Contents of file is now in buffer
*> We want to skip the first two bytes which are the UTF markers

move 3 to start-pos

*> We will parse the buffer using unstring delimited by the CRLF
*> delimiters so that we process one record at a time. Since this
*> is UTF-16 the actual code is 0D000A00.
*> The size of the record is held in rec-size.

     perform until exit
        move spaces to current-record
        unstring buffer
           delimited by crlf
           into current-record
              count in rec-size
              with pointer start-pos
        end-unstring
*> For ease of parsing we add a tab character to the end of the
*> record so all fields are delimited
        move tab-char to current-record(rec-size 1:2)
        add 2 to rec-size
*> This procedure will actually parse the individual fields
*> delimited by tab character and process them.
        perform 100-process-record
        if start-pos >= byte-count
           exit perform
        end-if
     end-perform
     display "recs processed = " rec-num
     stop run.

100-process-record.

    add 1 to rec-num.
    move 1 to field-pointer
    perform until exit
       move 0 to field-size
       unstring current-record
          delimited by tab-char
             into current-field
                count in field-size
             with pointer field-pointer
       end-unstring
*> space fill with UTF spaces
       move all X"2000" to current-field(field-size 1:)
       move current-field to current-field-n
*> Now field is in actual PIC N data type and you can process it
*> however you wish
       display current-field-n(1:field-size / 2)
       if field-pointer >= rec-size
          exit perform
       end-if
    end-perform.

1 Attachments

123429_convertutf16.zip

Like

D

Dominique Sacre
Author
Rocketeer
Forum|Forum|12 years ago
January 30, 2013

Hi all,

I'm using Net Express 5.1 on Windows Server 2008.

My job is to convert databases to get them ready for production. What do is when I have Gendercode, Firstname and Lastname in seperate fields, I convert them into full names and salutations. For example: "M;Brian;Jones" becomes "Mr. Brian Jones" and "Dear mr Jones,".

It all goes well as long as the databases are 8 bit ANSI.

The problem i sumbled upon is this: A customer delivers Eastern European names including there high characters in an Excel database. As soon as i export the database into CSV-ANSI those characters are lost, because they need 16 bits in stead of the standard 8 bits. I managed to export then as UTF-16 LE. I already understand that it is LE (Little Endian), because the BOM (Byte Order Mark) says x"FF FE".

I've read that Net Express can handel 16-bit LE databases quite well. Problem is that i'm a total newbe on this. So my question is: How different do I have to set up the CBL programm? Where do i start and is there an example CBL i can sneak into?

Kind Regards,

ZelosBV

Thank you Chris! It helped me alot.

It took me some time to understand it, but in the end I managed to expand the application to a programm that can handle bigger UNICODE-databases and writing data into a new database.

Jan