$encode("HEX",v_string)

Hi Freaks

Using decode-hex on string results in UTF-8 byte sequence
$1 = $encode("HEX","Ä") ; $1= "C384"

What I need is the unicode itself:
$1 = $encode("HEX","Ä","UNICODE") ; $1= "00C4"

Is there any workaround?
Okay, I could shift a few bits in "C384" , but this functions are also missing

The other workaround is to build a list

FOR v_I = 0 to 65335
  putitem/id v_list,$string($concat("&#,v_I,";")),v_I
ENDFOR

and the do a lookup into this list.
But this is time consuming with UnifAce functions.
And the UnifAce kernel holds all necessary informations ....

BTW: The documentation for $encode says only "Hexadecimal encoding scheme."
No word, which codepage is in use. You have to read between the lines, that UnifAce internal uses UTF-8

Page 1 / 1

Hi Freaks

Using decode-hex on string results in UTF-8 byte sequence
$1 = $encode("HEX","Ä") ; $1= "C384"

What I need is the unicode itself:
$1 = $encode("HEX","Ä","UNICODE") ; $1= "00C4"

Is there any workaround?
Okay, I could shift a few bits in "C384" , but this functions are also missing

The other workaround is to build a list

FOR v_I = 0 to 65335
  putitem/id v_list,$string($concat("&#,v_I,";")),v_I
ENDFOR

and the do a lookup into this list.
But this is time consuming with UnifAce functions.
And the UnifAce kernel holds all necessary informations ....

BTW: The documentation for $encode says only "Hexadecimal encoding scheme."
No word, which codepage is in use. You have to read between the lines, that UnifAce internal uses UTF-8

Hi Ingo,

Small question... What is Unicode?

When you mention "What I need is the unicode itself", you mean the 16 bits encoded Unicode, UTF-16.

As far as I know both, UTF-8 and UTF-16, are Unicode.

So, I guess, the C384 returned by Uniface is very valid Unicode in the UTF-8 encoding.
I am not sure if a platform based on a UTF-8 is able to convert to UTF-16 or UCS-2. And why not go directly to UTF-32?

Intesting manifesto about Unicode: http://utf8everywhere.org/

Hi Freaks

Using decode-hex on string results in UTF-8 byte sequence
$1 = $encode("HEX","Ä") ; $1= "C384"

What I need is the unicode itself:
$1 = $encode("HEX","Ä","UNICODE") ; $1= "00C4"

Is there any workaround?
Okay, I could shift a few bits in "C384" , but this functions are also missing

The other workaround is to build a list

FOR v_I = 0 to 65335
  putitem/id v_list,$string($concat("&#,v_I,";")),v_I
ENDFOR

and the do a lookup into this list.
But this is time consuming with UnifAce functions.
And the UnifAce kernel holds all necessary informations ....

BTW: The documentation for $encode says only "Hexadecimal encoding scheme."
No word, which codepage is in use. You have to read between the lines, that UnifAce internal uses UTF-8

Hi Peter

Unicode is a codepage, where each character is defined as a codepoint. This codepoint is a numeric value either represent decimal or hexadecimal

UTF-8 is a transformation format which converts hexadecimal values into a sequence of bytes
UTF=Universal character transformation format.

So Uniface returns a UTF-8 coded codepoint and not the codepage itself

Ingo

Hi Freaks

Using decode-hex on string results in UTF-8 byte sequence
$1 = $encode("HEX","Ä") ; $1= "C384"

What I need is the unicode itself:
$1 = $encode("HEX","Ä","UNICODE") ; $1= "00C4"

Is there any workaround?
Okay, I could shift a few bits in "C384" , but this functions are also missing

The other workaround is to build a list

FOR v_I = 0 to 65335
  putitem/id v_list,$string($concat("&#,v_I,";")),v_I
ENDFOR

and the do a lookup into this list.
But this is time consuming with UnifAce functions.
And the UnifAce kernel holds all necessary informations ....

BTW: The documentation for $encode says only "Hexadecimal encoding scheme."
No word, which codepage is in use. You have to read between the lines, that UnifAce internal uses UTF-8

Hi Ingo,

just two small addendum to this discussion:

- Uniface use of UTF-8 as kernel basic codepage was the BIG change related to migration from U8.4 to U9.0 end 2007; Uniface before U9 was using the codepage choosen during installation.

- I was going to write you what Peter already wrote; anyhow AFAIK Uniface kernel includes ICU library, which sits in the middle of all those functionalities related to character set conversion/evaluation.

About "nice to have" functions in Proc code: I raised already this issue more times = they are "Small things immediately available for ALL Uniface developers".
The only path for us on the field is to open wishes and wait.

Best Regards,
Gianni

Sign up

Please log in or register:

Welcome to the Rocket Forum!

Please log in or register:

Scanning file for viruses.

This file cannot be downloaded