Skip to main content

How to deal with special character Ć (C accent aigu) in my app

  • May 17, 2021
  • 4 replies
  • 0 views

Forum|alt.badge.img

Hi,

I need to add special character to my app. set of special characters. It is Ć (C accent aigu). But this special character is not in the ASCII table unlike other special characters used in my app. And using windows character map to add the Ć for example in to the created xml file report is causing troubles. It looks like this:


How can I handle this special char. in my app? 

4 replies

Ingo Stiller
Forum|alt.badge.img+3
  • Participating Frequently
  • May 17, 2021

Hi,

I need to add special character to my app. set of special characters. It is Ć (C accent aigu). But this special character is not in the ASCII table unlike other special characters used in my app. And using windows character map to add the Ć for example in to the created xml file report is causing troubles. It looks like this:


How can I handle this special char. in my app? 

Hi Jozef

short explanation:
"utf-8 resp "utf-16" are both "ZIP"-formats for unicode as the try to use as less byte per chararcter as possible.
Both of them can handle all unicode characters, so there are not restricting unicode to a codepage
"CP1250" on the other hand is a codepage for 256 characters (suitable for middle europe)
"ASCII" is a codepage with only 96 characters (plus 32 controls) (suitable for US/UK/...)

You character (if I interpret the glphy correct) is "LATIN CAPITAL LETTER C WITH ACUTE"  with unicode U+0106.
This one is in codepage CP1250 at point 0xC6 and so you can als used it within "middle european" text

UnifAce is uniocde-enabled, so you can use any unicode character inside of UnifAce.
Problems are the input/output interfaces as they are often only capable of using a limited range of characters (codepages)
If you create XML by your self and then write it by lfiledump, you can set the encoding, e.g. UTF-8
But always write the encoding
<?xml version="1.0" encoding="utf-8"?>

If you see a few cryptic glyphs in the XML, this could due three things:

  1. You use a wrong unicode codepoint
  2. You don't write the encoding so that the other programm don't know how to interpret
  3. The other programm ignores any encoding

To make a character more readable for a texteditor without knowledge about an encoding, you can use XML-entities:

So U-0106 became "&#x0106;"  or even better "&Cacute;"

Ingo


Forum|alt.badge.img
  • Author
  • Participating Frequently
  • May 18, 2021

Hi Jozef

short explanation:
"utf-8 resp "utf-16" are both "ZIP"-formats for unicode as the try to use as less byte per chararcter as possible.
Both of them can handle all unicode characters, so there are not restricting unicode to a codepage
"CP1250" on the other hand is a codepage for 256 characters (suitable for middle europe)
"ASCII" is a codepage with only 96 characters (plus 32 controls) (suitable for US/UK/...)

You character (if I interpret the glphy correct) is "LATIN CAPITAL LETTER C WITH ACUTE"  with unicode U+0106.
This one is in codepage CP1250 at point 0xC6 and so you can als used it within "middle european" text

UnifAce is uniocde-enabled, so you can use any unicode character inside of UnifAce.
Problems are the input/output interfaces as they are often only capable of using a limited range of characters (codepages)
If you create XML by your self and then write it by lfiledump, you can set the encoding, e.g. UTF-8
But always write the encoding
<?xml version="1.0" encoding="utf-8"?>

If you see a few cryptic glyphs in the XML, this could due three things:

  1. You use a wrong unicode codepoint
  2. You don't write the encoding so that the other programm don't know how to interpret
  3. The other programm ignores any encoding

To make a character more readable for a texteditor without knowledge about an encoding, you can use XML-entities:

So U-0106 became "&#x0106;"  or even better "&Cacute;"

Ingo

Hi Ingo, thanks for your answer. But I see also another issue now. When I enter this C with accent via app. it is correctly displayed in the screen but when I save it, then in the DB it is stored with weird character Æ. How can I fix it so in the DB it is also stored as C with accent?


Ingo Stiller
Forum|alt.badge.img+3
  • Participating Frequently
  • May 18, 2021

Hi,

I need to add special character to my app. set of special characters. It is Ć (C accent aigu). But this special character is not in the ASCII table unlike other special characters used in my app. And using windows character map to add the Ć for example in to the created xml file report is causing troubles. It looks like this:


How can I handle this special char. in my app? 

Hi Jozef

Is the character still correctly displayed after storing and retrieve back to the surface?

If this is true, there is no need to worry about the strange glphys on DB 🙂

If after readback, the character/glyph changed, than it is an issue with the database/database-driver.

Which interface typ in UnifAce do this field has (C*,W*,R*,....)
Which DBMS are you using? "MSS" or ?
Which COLLATION (MS-SQL) are you using?


An idea to check wether the character is correct beside the stange glphys:
Use a SQL-statement like this:
SELECT CAST(my_field as varbinary(100)) FROM my_table
Then check the HEX-code(s) for the letter.
If it is 0xc486, then the unicode codepoint 0x0106 is correct transformed by UTF-8 and your UnifAce driver uses UTF-8.

Check the COLLATION, if it ends with "_UTF8"
If not, it's all correct, as the UTF-8 byte sequence is treated as a normal "ASCII"/"CPnnnn"-sequence
If the COLLATION is an UTF8 one, you have to blame Microsoft 🙂

If it is not 0xc486 or 0xC6 then your UnifAce-driver is using some strange codepage (neither UFT8 nor CP1250)

Ingo

PS:
Character: One distinguish letter/sign/... without knowing how to display or to store
Glyph: The visual representation of a character
Byte sequence: The representation of a character/codepoint/something_else in bits and bytes
Codepage: A selection of characters including the bytesequence
Codepoint: A character or some special things in unicode
UTF:  UCS Transformation Format      (Not necessary unicode!)
UCS: Universal Coded Character Set  (Not necessary unicode!)

UTF8 on Unicode:
All Unicodes<=0x007F are converted to one byte which is 1:1 ASCII. So in most cases, the length of a document doesnt' change or only very little. Characters above 0x0100 are two bytes or longer.

All Codepoints could be transformed into a byte sequence by i.e. UTF8 or UTF16
Not all Codepoints can be displayed. This depends (amongst other things ) to the fonts installed
A codepage holds (normaly) (32+)96+128 characters
   32 control characters
   96 characters close to the ASCII default
 128 extended characters
Far from all codepoints can be mapped to a codepage
When only bytesequences a passed from on application to another, it's not clear which transformation/codepage/codeunit is behind the bytes, so this could lead to very strange glyphs 🙂



Gianni Sandigliano
Forum|alt.badge.img

Hi Jozef

short explanation:
"utf-8 resp "utf-16" are both "ZIP"-formats for unicode as the try to use as less byte per chararcter as possible.
Both of them can handle all unicode characters, so there are not restricting unicode to a codepage
"CP1250" on the other hand is a codepage for 256 characters (suitable for middle europe)
"ASCII" is a codepage with only 96 characters (plus 32 controls) (suitable for US/UK/...)

You character (if I interpret the glphy correct) is "LATIN CAPITAL LETTER C WITH ACUTE"  with unicode U+0106.
This one is in codepage CP1250 at point 0xC6 and so you can als used it within "middle european" text

UnifAce is uniocde-enabled, so you can use any unicode character inside of UnifAce.
Problems are the input/output interfaces as they are often only capable of using a limited range of characters (codepages)
If you create XML by your self and then write it by lfiledump, you can set the encoding, e.g. UTF-8
But always write the encoding
<?xml version="1.0" encoding="utf-8"?>

If you see a few cryptic glyphs in the XML, this could due three things:

  1. You use a wrong unicode codepoint
  2. You don't write the encoding so that the other programm don't know how to interpret
  3. The other programm ignores any encoding

To make a character more readable for a texteditor without knowledge about an encoding, you can use XML-entities:

So U-0106 became "&#x0106;"  or even better "&Cacute;"

Ingo

Hi Jozef,

AFAIK description from Ingo was a complete one.

could it be:
1) you are saving a Unicode character in a field which is NOT enabled to it?
  or
2) you are looking to that saved field on DB with a tool which is not correctly Unicode enabled?

Uniface could save full Unicode charset when:
a) your (R)DBMS is fully Unicode enabled as default charset
b) your (R)DBMS is NOT fully Unicode enabled as default charset but each single field can be specifically enabled/mapped. Uniface acts this way when a field is mapped to a W packing code, (Wx, VWx, W*).

I am currently using option b) a lot on Oracle 12.2 on with success.

Hope it helps.

Gianni