Uniface User Forum

 View Only
  • 1.  Convert from CP-1252 to UTF-8

    PARTNER
    Posted 24 days ago

    I was wondering if anyone else had run into this before. If someone copies characters from a Word document and then pastes into an editbox, sometimes Uniface returns an error message or it will store it as some weird character. I have seen this countless times over the decades and have always just coded it using the $format and $replace in the formatFromDisplay trigger. Has anyone else run into the same type of thing and if so, did you come up with a way to convert any CP-1252 character to a UTF-8?

    Here's a quick example of how I currently resolve this on a case-by-case basis. These are the "smart" quotes that Word puts in automatically while you type. I just replace them with the standard ascii quotes.

    trigger formatFromDisplay
    $format = $replace($replace($format, 1, """, $tometa(34), -1), 1, """, $tometa(34), -1) 
    $format = $replace($replace($format, 1, """, $tometa(34), -1), 1, """, $tometa(34), -1)

    Here's a similar approach in c# (link below). It's trying to convert the contents of a file whereas I would be happy with converting the contents of a single field:

    Steve McGill - Ramblings about Sitecore, EXM, and more

    If anyone has any ideas on this, I'd love to hear them.

    Best,

    Larry



    ------------------------------
    Larry Adkins
    Proware
    Cincinnati OH US
    ------------------------------


  • 2.  RE: Convert from CP-1252 to UTF-8

    Posted 20 days ago
      |   view attached

    Hi Larry

    First of all, UTF-8 x actually not a code page but only a ZIP procedure for reducing the size of unicode strings, optimized for the characters 0-127 (good old ASCII) :-)
    The "meta"- codepages are a specialty of UnifAce, where font 0 again corresponds to ASCII (7-bit)
    A long time ago I played around with codepages in UnifAce and programmed a few forms that display the fonts/codepages and also offer a few other operations. Compile and call PS0891Z.
    As I said, it's very old, but at least it's better than nothing :-)
    Ingo



    ------------------------------
    Ingo Stiller
    Aareon Deutschland GmbH
    ------------------------------

    Attachment(s)

    zip
    ps0891_ff.zip   20 KB 1 version


  • 3.  RE: Convert from CP-1252 to UTF-8

    PARTNER
    Posted 20 days ago

    Hi Ingo,

    This is interesting stuff....makes me glad I don't have to deal with extended character sets on a regular basis. I believe I can use some of what you have in that component to come up with a character translator that will fit my needs. In the end, I'll still have to decide on what to do with characters that don't land within characters 0 - 127. There's also the possibility that I will choose to save the data to be merged in Word to a plain text file letting Word do the conversion for me then take that result into the final merge process. My comfort level with Word COM calls is higher than dealing with converting individual characters.

    I do appreciate the suggestion and if I choose to use something I gleaned from your code, I will post my solution here.



    ------------------------------
    Larry Adkins
    Proware
    Cincinnati OH US
    ------------------------------



  • 4.  RE: Convert from CP-1252 to UTF-8

    Posted 20 days ago

    Hi Larry

    Characters not in the range 0..127 are maybe in another (UnifAce) "fonts":
    https://www3.rocketsoftware.com/rocketd3/support/documentation/Uniface/104/uniface/platformSupport/characterSets/UNIFACE_character_sets.htm

    By the way:

    $tometa also accepts values > 127  :-)

    Or better, intern a character holds the meta-font number and the numer of the character in the meta-font-
    Plus the attributes (if I'm not completely wrong, or UnifAce)
    This strange coding is due to the fact that UnifAce longer exists then Unicode and had to deal with non ASCII-characeters long before

    $tometa((192+v_FONT_NBR)*256+CHAR_NBR)
    or 
    $concat($tometa(192+v_FONT_NBR),$tometa(CHAR_NBR))

    So if you need to write the Euro character, the statement should look like this
    $tometa(50558)
    or
    $concat($tometa(197),$tometa(126))

    One more thing 
    If you use CP1252 and data type "C" on the database, you cannot write all Unicode characters.
    Then you have to use data type "W" (for wide?) , which writes all UniCode characters

    Ingo



    ------------------------------
    Ingo Stiller
    Aareon Deutschland GmbH
    ------------------------------