Skip to main content

Hi Freaks

Using decode-hex on string results in UTF-8 byte sequence
$1 = $encode("HEX","Ä")   ; $1= "C384" 

What I need is the unicode itself:
$1 = $encode("HEX","Ä","UNICODE")   ; $1= "00C4"

Is there any workaround?
Okay, I could shift a few bits in "C384" , but this functions are also missing

The other workaround is to build a list

FOR v_I = 0 to 65335
  putitem/id v_list,$string($concat("&#,v_I,";")),v_I
ENDFOR

and the do a lookup into this list.
But this is time consuming with UnifAce functions.
And the UnifAce kernel holds all necessary informations ....


BTW: The documentation for $encode says only "Hexadecimal encoding scheme."
No word, which codepage is in use. You have to read between the lines, that UnifAce internal uses UTF-8

Hi Freaks

Using decode-hex on string results in UTF-8 byte sequence
$1 = $encode("HEX","Ä")   ; $1= "C384" 

What I need is the unicode itself:
$1 = $encode("HEX","Ä","UNICODE")   ; $1= "00C4"

Is there any workaround?
Okay, I could shift a few bits in "C384" , but this functions are also missing

The other workaround is to build a list

FOR v_I = 0 to 65335
  putitem/id v_list,$string($concat("&#,v_I,";")),v_I
ENDFOR

and the do a lookup into this list.
But this is time consuming with UnifAce functions.
And the UnifAce kernel holds all necessary informations ....


BTW: The documentation for $encode says only "Hexadecimal encoding scheme."
No word, which codepage is in use. You have to read between the lines, that UnifAce internal uses UTF-8

Hi Ingo,

Small question... What is Unicode? 

When you mention "What I need is the unicode itself", you mean the 16 bits encoded Unicode, UTF-16.

As far as I know both, UTF-8 and UTF-16, are Unicode. 

So, I guess, the C384 returned by Uniface is very valid Unicode in the UTF-8 encoding.
I am not sure if a platform based on a UTF-8 is able to convert to UTF-16 or UCS-2. And why not go directly to UTF-32?

Intesting manifesto about Unicode: http://utf8everywhere.org/


Hi Freaks

Using decode-hex on string results in UTF-8 byte sequence
$1 = $encode("HEX","Ä")   ; $1= "C384" 

What I need is the unicode itself:
$1 = $encode("HEX","Ä","UNICODE")   ; $1= "00C4"

Is there any workaround?
Okay, I could shift a few bits in "C384" , but this functions are also missing

The other workaround is to build a list

FOR v_I = 0 to 65335
  putitem/id v_list,$string($concat("&#,v_I,";")),v_I
ENDFOR

and the do a lookup into this list.
But this is time consuming with UnifAce functions.
And the UnifAce kernel holds all necessary informations ....


BTW: The documentation for $encode says only "Hexadecimal encoding scheme."
No word, which codepage is in use. You have to read between the lines, that UnifAce internal uses UTF-8

Hi Peter

Unicode is a codepage, where each character is defined as a codepoint. This codepoint is a numeric value either represent decimal or hexadecimal


UTF-8 is a transformation format which converts hexadecimal values into a sequence of bytes
UTF=Universal character transformation format.

So Uniface returns a UTF-8 coded codepoint and not the codepage itself

Ingo




Hi Freaks

Using decode-hex on string results in UTF-8 byte sequence
$1 = $encode("HEX","Ä")   ; $1= "C384" 

What I need is the unicode itself:
$1 = $encode("HEX","Ä","UNICODE")   ; $1= "00C4"

Is there any workaround?
Okay, I could shift a few bits in "C384" , but this functions are also missing

The other workaround is to build a list

FOR v_I = 0 to 65335
  putitem/id v_list,$string($concat("&#,v_I,";")),v_I
ENDFOR

and the do a lookup into this list.
But this is time consuming with UnifAce functions.
And the UnifAce kernel holds all necessary informations ....


BTW: The documentation for $encode says only "Hexadecimal encoding scheme."
No word, which codepage is in use. You have to read between the lines, that UnifAce internal uses UTF-8

Hi Ingo,

just two small addendum to this discussion:

- Uniface use of UTF-8 as kernel basic codepage was the BIG change related to migration from U8.4 to U9.0 end  2007; Uniface before U9 was using the codepage choosen during installation.

- I was going to write you what Peter already wrote; anyhow AFAIK Uniface kernel includes ICU library, which sits in the middle of all those functionalities related to character set conversion/evaluation.

About "nice to have" functions in Proc code: I raised already this issue more times = they are "Small things immediately available for ALL Uniface developers".
The only path for us on the field is to open wishes and wait.

Best Regards,
Gianni