Skip to main content

Working on a routine to take HTML input and convert HTML entities like &lt; and &gt to their character equivalent.  However, HTML entities are case insensitive, so the input string might be &lt;, &LT;, &lT; or &Lt;.  Is there a clean way to search for &lt; in the string without case sensitivity so the value can be replaced with "<"?

Working on a routine to take HTML input and convert HTML entities like &lt; and &gt to their character equivalent.  However, HTML entities are case insensitive, so the input string might be &lt;, &LT;, &lT; or &Lt;.  Is there a clean way to search for &lt; in the string without case sensitivity so the value can be replaced with "<"?

Does sed 's/&lt;/</gI' /input/path.txt work?


Working on a routine to take HTML input and convert HTML entities like &lt; and &gt to their character equivalent.  However, HTML entities are case insensitive, so the input string might be &lt;, &LT;, &lT; or &Lt;.  Is there a clean way to search for &lt; in the string without case sensitivity so the value can be replaced with "<"?

M Arcus1, it could work but it'd be limited to Linux and we'd have to callout a bunch of times to sed for all of the html entities we want to convert.  Could be massively inefficient.


Working on a routine to take HTML input and convert HTML entities like &lt; and &gt to their character equivalent.  However, HTML entities are case insensitive, so the input string might be &lt;, &LT;, &lT; or &Lt;.  Is there a clean way to search for &lt; in the string without case sensitivity so the value can be replaced with "<"?

Kevin,

There is not a clean way of doing this with a UniBasic function. The simplist way I can currently think of is to do this is to use python to do this.

So this is how you do it python in UniData.

:python
python> import re
python> phrase = "&LT;&lt;&lT;&Lt"
python> phrase = re.sub("&LT","<",phrase,flags=re.IGNORECASE)
python> print(phrase)
<;<;<;<
python>

I don't know if you have used python before or not, if not just reply back and I'll take you through using it in basic.

Regards,


Working on a routine to take HTML input and convert HTML entities like &lt; and &gt to their character equivalent.  However, HTML entities are case insensitive, so the input string might be &lt;, &LT;, &lT; or &Lt;.  Is there a clean way to search for &lt; in the string without case sensitivity so the value can be replaced with "<"?

What I do in UniVerse is set up UPPER = "ABCD..." and LOWER="abcd..." then CONVERT LOWER TO UPPER in my variable, then I have all upper case to match against.

Dale


Working on a routine to take HTML input and convert HTML entities like &lt; and &gt to their character equivalent.  However, HTML entities are case insensitive, so the input string might be &lt;, &LT;, &lT; or &Lt;.  Is there a clean way to search for &lt; in the string without case sensitivity so the value can be replaced with "<"?

Kevin,

Here is how to do it in UniBasic using python.

My python function

AE PP replace.py
Top of "replace.py" in "PP", 4 lines, 132 characters.
*--: P
001: import re
002: def caseinsswap(origst,stfrom,stto):
003:     phrase = re.sub( stfrom , stto , origst , flags=re.IGNORECASE )
004:     return phrase

In UniBasic

AE PBP CASEINSSWAP
Top of "CASEINSSWAP" in "PBP", 15 lines, 420 characters.
*--: P
001: ORIGSTRING = "&LT;&lt;&lT;&Lt"
002: STFROM = "&LT"
003: STTO = "<"
004: ModuleName="replace"
005: FuncName="caseinsswap"
006: pyresult = PyCallFunction(ModuleName, FuncName, ORIGSTRING, STFROM, STTO)
007: IF @PYEXCEPTIONTYPE = '' THEN
008:   CRT "Python RESULT: "
009:   CRT "NEWST = " : pyresult
010: END ELSE
011:   CRT "EXCEPTION TYPE IS " :@PYEXCEPTIONTYPE
012:   CRT "EXCEPTION MESSAGE IS " :@PYEXCEPTIONMSG
013:   CRT "EXCEPTIONTRACEBACK IS " :@PYEXCEPTIONTRACEBACK
014: END
015: END

 :RUN PBP CASEINSSWAP

Python RESULT:
NEWST = <;<;<;<

Regards,


Working on a routine to take HTML input and convert HTML entities like &lt; and &gt to their character equivalent.  However, HTML entities are case insensitive, so the input string might be &lt;, &LT;, &lT; or &Lt;.  Is there a clean way to search for &lt; in the string without case sensitivity so the value can be replaced with "<"?

Jonathon, it's an interesting perspective but I need to be compatible to some older Unidata versions that were pre-Python.  And on some of these older AIX systems, python isn't available.


Working on a routine to take HTML input and convert HTML entities like &lt; and &gt to their character equivalent.  However, HTML entities are case insensitive, so the input string might be &lt;, &LT;, &lT; or &Lt;.  Is there a clean way to search for &lt; in the string without case sensitivity so the value can be replaced with "<"?

If you want it to remain compatible with older pre python releases (i.e before 2017)  you will need to write some of your own code to do it. The UniData code itself appears to only call the cfunctions that are case sensitive. If you wanted to make it portable and you cannot use python, you will need to write your own UniBasic function or you could of course write your own C code to do it and then use the CALLC functionality of UniData to use your own C code.

If you wanted to do it in UniBasic, the following code should work

$BASICTYPE "U"
FUNCTION REPLACE.CASEINS(ORIG.STR, STR.TO.FIND, STR.TO.REPLACE)
*
* Case Insensitive Replace
*
NEW.STR = ""
ALL.LOWER.ORIG.STR = OCONV(ORIG.STR, "MCL")
ALL.LOWER.STR.TO.FIND = OCONV(STR.TO.FIND, "MCL")
LEN.LOWER.STR.TO.FIND = LEN(ALL.LOWER.STR.TO.FIND)
* First Time Check
IPOS = INDEX(ALL.LOWER.ORIG.STR, ALL.LOWER.STR.TO.FIND, 1)
IF NOT(IPOS) THEN
   NEW.STR = ORIG.STR
END ELSE
   LOOP
      NPOS = INDEX(ALL.LOWER.ORIG.STR, ALL.LOWER.STR.TO.FIND, 1)
   UNTIL NOT(NPOS) DO
      NEW.STR := ORIG.STR[1, NPOS - 1] : STR.TO.REPLACE
      ALL.LOWER.ORIG.STR = ALL.LOWER.ORIG.STR[NPOS + LEN.LOWER.STR.TO.FIND,9999]
      ORIG.STR = ORIG.STR[NPOS + LEN.LOWER.STR.TO.FIND,9999]
   REPEAT
   NEW.STR := ALL.LOWER.ORIG.STR
END
RETURN NEW.STR
END

AE BP TEST.REPLACE.CASEINS
Top of "TEST.REPLACE.CASEINS" in "BP", 10 lines, 285 characters.
*--: P
001: OLD.STR = "&lt;&LT;&lT;&Lt"
002: FIND.STR = "&LT"
003: NEW.STR = "<"
004: DEFFUN REPLACE.CASEINS(OLD.STR,FIND.STR,NEW.STR)
005: CRT REPLACE.CASEINS(OLD.STR,FIND.STR,NEW.STR)
006: OLD.STR = "I am using RUBY to make RuBY for sure"
007: FIND.STR = "RUBY"
008: NEW.STR = "*"
009: CRT REPLACE.CASEINS(OLD.STR,FIND.STR,NEW.STR)
010: END
Bottom.
*--: FIBR
Filed "TEST.REPLACE.CASEINS" in file "BP" unchanged.

Compiling Unibasic: BPTEST.REPLACE.CASEINS in mode 'u'.
compilation finished
<;<;<;<
I am using * to make * for sure
:

Thanks,


Working on a routine to take HTML input and convert HTML entities like &lt; and &gt to their character equivalent.  However, HTML entities are case insensitive, so the input string might be &lt;, &LT;, &lT; or &Lt;.  Is there a clean way to search for &lt; in the string without case sensitivity so the value can be replaced with "<"?

There we go.  That's what I thought we'd have to do.  A bit involved, but it gets there.  Thank you!


Working on a routine to take HTML input and convert HTML entities like &lt; and &gt to their character equivalent.  However, HTML entities are case insensitive, so the input string might be &lt;, &LT;, &lT; or &Lt;.  Is there a clean way to search for &lt; in the string without case sensitivity so the value can be replaced with "<"?

@Kevin King  I found a problem in the basic function that changed variables on pass back, here is the fixed version for UniBasic

$BASICTYPE "U"
FUNCTION REPLACE.IGNORECASE.BASIC(ORIG.STR, STR.TO.FIND, STR.TO.REPLACE)
*
* Function to REPLACE a string but ignoring case using UniBasic
* Jonathan Smith - August 2023
*
* ORIG.STR = The Original String
* STR.TO.FIND = The string to seacrh for in ORIG.STR
* STR.TO.REPLACE = The string to replace STR.TO.FIND with
*
NEW.STR = ""
ALL.LOWER.ORIG.STR = OCONV(ORIG.STR, "MCL")
ALL.LOWER.STR.TO.FIND = OCONV(STR.TO.FIND, "MCL")
LEN.LOWER.STR.TO.FIND = LEN(ALL.LOWER.STR.TO.FIND)
WORK.ORIG.STR = ORIG.STR
LOOP
   NPOS = INDEX(ALL.LOWER.ORIG.STR, ALL.LOWER.STR.TO.FIND, 1)
UNTIL NOT(NPOS) DO
   NEW.STR := WORK.ORIG.STR[1, NPOS - 1] : STR.TO.REPLACE
   ALL.LOWER.ORIG.STR = ALL.LOWER.ORIG.STR[NPOS + LEN.LOWER.STR.TO.FIND, 9999]
   WORK.ORIG.STR = WORK.ORIG.STR[NPOS + LEN.LOWER.STR.TO.FIND, 9999]
REPEAT
NEW.STR := ALL.LOWER.ORIG.STR
RETURN NEW.STR
END