Replace unicode characters with useful ascii characters | Rocket Forum

Hello everybody,

I'm facing the challenge to replace (periodically) unicode characters from UTF8-coded files into useful ascii characters in order to store them in an ISO database.

My first idea was working with a database translation table.

But a large DBTT results in poor performance. Additionally I faced the problem that I should use a "W" packingcode and this doesn't work with DBTT.

So we decided to do the replace "manually" at runtime with a "small" uniface proc after getting a $procerror -300 from a validatefield.

I assume one or two of you has had the same problem before and could successful solve it. Does anybody of you have a (ready and proven) translation file or script for this purpose and would like to share it with us?

I've searched a lot the web but couldn't find anything helpful for me.

Best regards and looking forward to a reply. Thanks in advance.

------------------------------
Michael Rösch
Abrechnungszentrum Emmendingen
------------------------------

Page 1 / 1

Hello everybody,

I'm facing the challenge to replace (periodically) unicode characters from UTF8-coded files into useful ascii characters in order to store them in an ISO database.

My first idea was working with a database translation table.

But a large DBTT results in poor performance. Additionally I faced the problem that I should use a "W" packingcode and this doesn't work with DBTT.

So we decided to do the replace "manually" at runtime with a "small" uniface proc after getting a $procerror -300 from a validatefield.

I assume one or two of you has had the same problem before and could successful solve it. Does anybody of you have a (ready and proven) translation file or script for this purpose and would like to share it with us?

I've searched a lot the web but couldn't find anything helpful for me.

Best regards and looking forward to a reply. Thanks in advance.

------------------------------
Michael Rösch
Abrechnungszentrum Emmendingen
------------------------------

Hello Michaël,

In our software we interfaced the iconv library which is the best tools we have found to convert between character sets including replacing unsupported characters.

Note: It exists also also as an Unix executable.

Bertrand Daene

CGM Lab molis

------------------------------
Bertrand Daene
Senior Developper
Cgm Lab International Gmbh (Lab Molis)
Brachon BE
------------------------------

Hello Michaël,

In our software we interfaced the iconv library which is the best tools we have found to convert between character sets including replacing unsupported characters.

Note: It exists also also as an Unix executable.

Bertrand Daene

CGM Lab molis

------------------------------
Bertrand Daene
Senior Developper
Cgm Lab International Gmbh (Lab Molis)
Brachon BE
------------------------------

Hello Bertrand,

thank you very much for your reply. I did some testing with iconv and the //TRANSLIT option on solaris.

But unfortunately all I've tested hasn't met my purposes. :-(

During my research I've found the site https://www.baeldung.com/linux/utf-8-ascii-conversion. From there I tried the call of

$ iconv -f UTF-8 -t ASCII//TRANSLIT input_utf8.txt -o output_ascii.txt

The output shown will have the encoding in ASCII format, including the transliterated character, such as all 'ç' characters being altered to 'c'.

My results with my own test file were different. I've tried for example:

Salihamidžić Brebrić

and got

Salihamidzic' Brebric'

€ and £ have been replaced with EUR and GBP. This could cause a "string too long error".

Do you know if it's possible to use "iconv" with own transliteration rules?

Best regards

------------------------------
Michael Rösch
Abrechnungszentrum Emmendingen
------------------------------

Hello Bertrand,

thank you very much for your reply. I did some testing with iconv and the //TRANSLIT option on solaris.

But unfortunately all I've tested hasn't met my purposes. :-(

During my research I've found the site https://www.baeldung.com/linux/utf-8-ascii-conversion. From there I tried the call of

$ iconv -f UTF-8 -t ASCII//TRANSLIT input_utf8.txt -o output_ascii.txt

The output shown will have the encoding in ASCII format, including the transliterated character, such as all 'ç' characters being altered to 'c'.

My results with my own test file were different. I've tried for example:

Salihamidžić Brebrić

and got

Salihamidzic' Brebric'

€ and £ have been replaced with EUR and GBP. This could cause a "string too long error".

Do you know if it's possible to use "iconv" with own transliteration rules?

Best regards

------------------------------
Michael Rösch
Abrechnungszentrum Emmendingen
------------------------------

Hello Michaël,

We used mainly to convert between UTF8 and LATIN1 (=ISO-8859-1) which less such transliteration extension. We use in fact own prefilter to eliminate such unwanted characters. But transliteration means often extending the length of the string. I have no solution if you have to preserve the length.

Note that transliteration depend also of LANG setting (or setlocale in C sources): translation of ö is different between de_DE and fr_FR

On our RedHat Linux there is no possibilities to define our own translation rules. But on some system it his possible. See for example geniconvtbl - man pages section 1: User Commands (oracle.com) I don't know about Solaris.

Best regards

------------------------------
Bertrand Daene
Senior Developper
Cgm Lab International Gmbh (Lab Molis)
Barchon BE
------------------------------

Hello Michaël,

We used mainly to convert between UTF8 and LATIN1 (=ISO-8859-1) which less such transliteration extension. We use in fact own prefilter to eliminate such unwanted characters. But transliteration means often extending the length of the string. I have no solution if you have to preserve the length.

Note that transliteration depend also of LANG setting (or setlocale in C sources): translation of ö is different between de_DE and fr_FR

On our RedHat Linux there is no possibilities to define our own translation rules. But on some system it his possible. See for example geniconvtbl - man pages section 1: User Commands (oracle.com) I don't know about Solaris.

Best regards

------------------------------
Bertrand Daene
Senior Developper
Cgm Lab International Gmbh (Lab Molis)
Barchon BE
------------------------------

Hello Bertrand,

thank you very much for your hints. We did already some investigations on it. Looks promising.

Best regards and have a nice weekend

------------------------------
Michael Rösch
Abrechnungszentrum Emmendingen
------------------------------