Special Characters in text (non-unicode)

Forum|Forum|11 years ago
July 23, 2014
4 replies
0 views

A

Anonymous

Using Extend 9.1.1 (or 9.2) on Windows. We want to import a text document which contains non-English characters such as ç, é, ã.

We can type those characters in Extend, and they display fine, but when we import from a file (created with Excel or even with a text editor), the characters end up being corrupted.

I know Extend doesn't support Unicode, and I'm sure the corruption has something to do with that - does anyone know of a way to import such a text file into Extend and have the characters retained?

Thanks

Tony

Stephen Hjerpe
Participating Frequently
Forum|Forum|11 years ago
July 23, 2014

Using Extend 9.1.1 (or 9.2) on Windows. We want to import a text document which contains non-English characters such as ç, é, ã.

We can type those characters in Extend, and they display fine, but when we import from a file (created with Excel or even with a text editor), the characters end up being corrupted.

I know Extend doesn't support Unicode, and I'm sure the corruption has something to do with that - does anyone know of a way to import such a text file into Extend and have the characters retained?

Thanks

Tony

It would be good to know the under lying data type these characters are stored in(in Excel) and what COBOL data type you are using. Have you exported them to XML and then read the XML? When you say import, are you using string or reference modification?

Like

A

Anonymous
Forum|Forum|11 years ago
July 23, 2014

Using Extend 9.1.1 (or 9.2) on Windows. We want to import a text document which contains non-English characters such as ç, é, ã.

We can type those characters in Extend, and they display fine, but when we import from a file (created with Excel or even with a text editor), the characters end up being corrupted.

I know Extend doesn't support Unicode, and I'm sure the corruption has something to do with that - does anyone know of a way to import such a text file into Extend and have the characters retained?

Thanks

Tony

Everything we're working with is string data. Forgetting about Excel for a minute, I'm confused even by what Windows does with these special characters just using Notepad. For example, the ç character is obtained by using the key sequence ALT 135 - this works in Extend, and Notepad. However, if I take a text string entered in Extend and export it to a text file, and check it with a hex editor, the character is stored as Hex87 (which is decimal 135). In notepad, however, the character displays different. If I use key sequence ALT 135 in Notepad, I get the ç character, but when I save the document and check it with a hex editor, the character is actually saved as HexE7, which is decimal 231. There is clearly a version taking place somewhere (seemingly at a Windows level).

Like

D

Dominique Sacre
Rocketeer
Forum|Forum|11 years ago
July 23, 2014

Using Extend 9.1.1 (or 9.2) on Windows. We want to import a text document which contains non-English characters such as ç, é, ã.

We can type those characters in Extend, and they display fine, but when we import from a file (created with Excel or even with a text editor), the characters end up being corrupted.

I know Extend doesn't support Unicode, and I'm sure the corruption has something to do with that - does anyone know of a way to import such a text file into Extend and have the characters retained?

Thanks

Tony

It was for this very reason that Unicode was developed.

I don't know the various details, but this has to do with code pages and OEM vs ANSI encoding. I suspect if you do enough Googling, you will find the answer.

Like

A

Anonymous
Forum|Forum|11 years ago
July 24, 2014

Using Extend 9.1.1 (or 9.2) on Windows. We want to import a text document which contains non-English characters such as ç, é, ã.

We can type those characters in Extend, and they display fine, but when we import from a file (created with Excel or even with a text editor), the characters end up being corrupted.

I know Extend doesn't support Unicode, and I'm sure the corruption has something to do with that - does anyone know of a way to import such a text file into Extend and have the characters retained?

Thanks

Tony

Thanks for the suggestions. It turns out that Extend uses Extended ASCII codes, but Windows stores characters using Windows-1252 format. Even in Notepad, you can use ASCII codes (ALT code) to enter a character, but that code is converted to Windows-1252 format when the file is saved. For example, a ç character (ALT 135, hex 87) is displayed correctly in Notepad, but when saved, it is saved as hex E7. Converting Windows-1252 codes to extended ASCII during the import solves the problem.

Like

Recent badge winners

Sign up

Please log in or register:

Welcome to the Rocket Forum!

Please log in or register:

Scanning file for viruses.

This file cannot be downloaded