Skip to main content
Question

Unidata 8.2: Is there a function to do a case-insensitive find-and-replace (i.e. CHANGE) of a string?

  • July 31, 2023
  • 9 replies
  • 0 views

Kevin King

Working on a routine to take HTML input and convert HTML entities like &lt; and &gt to their character equivalent.  However, HTML entities are case insensitive, so the input string might be &lt;, &LT;, &lT; or &Lt;.  Is there a clean way to search for &lt; in the string without case sensitivity so the value can be replaced with "<"?

9 replies

  • Participating Frequently
  • July 31, 2023

Working on a routine to take HTML input and convert HTML entities like &lt; and &gt to their character equivalent.  However, HTML entities are case insensitive, so the input string might be &lt;, &LT;, &lT; or &Lt;.  Is there a clean way to search for &lt; in the string without case sensitivity so the value can be replaced with "<"?

Does sed 's/&lt;/</gI' /input/path.txt work?


Kevin King
  • Author
  • Participating Frequently
  • July 31, 2023

Working on a routine to take HTML input and convert HTML entities like &lt; and &gt to their character equivalent.  However, HTML entities are case insensitive, so the input string might be &lt;, &LT;, &lT; or &Lt;.  Is there a clean way to search for &lt; in the string without case sensitivity so the value can be replaced with "<"?

M Arcus1, it could work but it'd be limited to Linux and we'd have to callout a bunch of times to sed for all of the html entities we want to convert.  Could be massively inefficient.


Jonathan Smith
Forum|alt.badge.img+4

Working on a routine to take HTML input and convert HTML entities like &lt; and &gt to their character equivalent.  However, HTML entities are case insensitive, so the input string might be &lt;, &LT;, &lT; or &Lt;.  Is there a clean way to search for &lt; in the string without case sensitivity so the value can be replaced with "<"?

Kevin,

There is not a clean way of doing this with a UniBasic function. The simplist way I can currently think of is to do this is to use python to do this.

So this is how you do it python in UniData.

:python
python> import re
python> phrase = "&LT;&lt;&lT;&Lt"
python> phrase = re.sub("&LT","<",phrase,flags=re.IGNORECASE)
python> print(phrase)
<;<;<;<
python>

I don't know if you have used python before or not, if not just reply back and I'll take you through using it in basic.

Regards,


Dale Kelley
  • Participating Frequently
  • August 1, 2023

Working on a routine to take HTML input and convert HTML entities like &lt; and &gt to their character equivalent.  However, HTML entities are case insensitive, so the input string might be &lt;, &LT;, &lT; or &Lt;.  Is there a clean way to search for &lt; in the string without case sensitivity so the value can be replaced with "<"?

What I do in UniVerse is set up UPPER = "ABCD..." and LOWER="abcd..." then CONVERT LOWER TO UPPER in my variable, then I have all upper case to match against.

Dale


Jonathan Smith
Forum|alt.badge.img+4

Working on a routine to take HTML input and convert HTML entities like &lt; and &gt to their character equivalent.  However, HTML entities are case insensitive, so the input string might be &lt;, &LT;, &lT; or &Lt;.  Is there a clean way to search for &lt; in the string without case sensitivity so the value can be replaced with "<"?

Kevin,

Here is how to do it in UniBasic using python.

My python function

AE PP replace.py
Top of "replace.py" in "PP", 4 lines, 132 characters.
*--: P
001: import re
002: def caseinsswap(origst,stfrom,stto):
003:     phrase = re.sub( stfrom , stto , origst , flags=re.IGNORECASE )
004:     return phrase

In UniBasic

AE PBP CASEINSSWAP
Top of "CASEINSSWAP" in "PBP", 15 lines, 420 characters.
*--: P
001: ORIGSTRING = "&LT;&lt;&lT;&Lt"
002: STFROM = "&LT"
003: STTO = "<"
004: ModuleName="replace"
005: FuncName="caseinsswap"
006: pyresult = PyCallFunction(ModuleName, FuncName, ORIGSTRING, STFROM, STTO)
007: IF @PYEXCEPTIONTYPE = '' THEN
008:   CRT "Python RESULT: "
009:   CRT "NEWST = " : pyresult
010: END ELSE
011:   CRT "EXCEPTION TYPE IS " :@PYEXCEPTIONTYPE
012:   CRT "EXCEPTION MESSAGE IS " :@PYEXCEPTIONMSG
013:   CRT "EXCEPTIONTRACEBACK IS " :@PYEXCEPTIONTRACEBACK
014: END
015: END

 :RUN PBP CASEINSSWAP

Python RESULT:
NEWST = <;<;<;<

Regards,


Kevin King
  • Author
  • Participating Frequently
  • August 1, 2023

Working on a routine to take HTML input and convert HTML entities like &lt; and &gt to their character equivalent.  However, HTML entities are case insensitive, so the input string might be &lt;, &LT;, &lT; or &Lt;.  Is there a clean way to search for &lt; in the string without case sensitivity so the value can be replaced with "<"?

Jonathon, it's an interesting perspective but I need to be compatible to some older Unidata versions that were pre-Python.  And on some of these older AIX systems, python isn't available.


Jonathan Smith
Forum|alt.badge.img+4

Working on a routine to take HTML input and convert HTML entities like &lt; and &gt to their character equivalent.  However, HTML entities are case insensitive, so the input string might be &lt;, &LT;, &lT; or &Lt;.  Is there a clean way to search for &lt; in the string without case sensitivity so the value can be replaced with "<"?

If you want it to remain compatible with older pre python releases (i.e before 2017)  you will need to write some of your own code to do it. The UniData code itself appears to only call the cfunctions that are case sensitive. If you wanted to make it portable and you cannot use python, you will need to write your own UniBasic function or you could of course write your own C code to do it and then use the CALLC functionality of UniData to use your own C code.

If you wanted to do it in UniBasic, the following code should work

$BASICTYPE "U"
FUNCTION REPLACE.CASEINS(ORIG.STR, STR.TO.FIND, STR.TO.REPLACE)
*
* Case Insensitive Replace
*
NEW.STR = ""
ALL.LOWER.ORIG.STR = OCONV(ORIG.STR, "MCL")
ALL.LOWER.STR.TO.FIND = OCONV(STR.TO.FIND, "MCL")
LEN.LOWER.STR.TO.FIND = LEN(ALL.LOWER.STR.TO.FIND)
* First Time Check
IPOS = INDEX(ALL.LOWER.ORIG.STR, ALL.LOWER.STR.TO.FIND, 1)
IF NOT(IPOS) THEN
   NEW.STR = ORIG.STR
END ELSE
   LOOP
      NPOS = INDEX(ALL.LOWER.ORIG.STR, ALL.LOWER.STR.TO.FIND, 1)
   UNTIL NOT(NPOS) DO
      NEW.STR := ORIG.STR[1, NPOS - 1] : STR.TO.REPLACE
      ALL.LOWER.ORIG.STR = ALL.LOWER.ORIG.STR[NPOS + LEN.LOWER.STR.TO.FIND,9999]
      ORIG.STR = ORIG.STR[NPOS + LEN.LOWER.STR.TO.FIND,9999]
   REPEAT
   NEW.STR := ALL.LOWER.ORIG.STR
END
RETURN NEW.STR
END

AE BP TEST.REPLACE.CASEINS
Top of "TEST.REPLACE.CASEINS" in "BP", 10 lines, 285 characters.
*--: P
001: OLD.STR = "&lt;&LT;&lT;&Lt"
002: FIND.STR = "&LT"
003: NEW.STR = "<"
004: DEFFUN REPLACE.CASEINS(OLD.STR,FIND.STR,NEW.STR)
005: CRT REPLACE.CASEINS(OLD.STR,FIND.STR,NEW.STR)
006: OLD.STR = "I am using RUBY to make RuBY for sure"
007: FIND.STR = "RUBY"
008: NEW.STR = "*"
009: CRT REPLACE.CASEINS(OLD.STR,FIND.STR,NEW.STR)
010: END
Bottom.
*--: FIBR
Filed "TEST.REPLACE.CASEINS" in file "BP" unchanged.

Compiling Unibasic: BPTEST.REPLACE.CASEINS in mode 'u'.
compilation finished
<;<;<;<
I am using * to make * for sure
:

Thanks,


Kevin King
  • Author
  • Participating Frequently
  • August 1, 2023

Working on a routine to take HTML input and convert HTML entities like &lt; and &gt to their character equivalent.  However, HTML entities are case insensitive, so the input string might be &lt;, &LT;, &lT; or &Lt;.  Is there a clean way to search for &lt; in the string without case sensitivity so the value can be replaced with "<"?

There we go.  That's what I thought we'd have to do.  A bit involved, but it gets there.  Thank you!


Jonathan Smith
Forum|alt.badge.img+4

Working on a routine to take HTML input and convert HTML entities like &lt; and &gt to their character equivalent.  However, HTML entities are case insensitive, so the input string might be &lt;, &LT;, &lT; or &Lt;.  Is there a clean way to search for &lt; in the string without case sensitivity so the value can be replaced with "<"?

@Kevin King  I found a problem in the basic function that changed variables on pass back, here is the fixed version for UniBasic

$BASICTYPE "U"
FUNCTION REPLACE.IGNORECASE.BASIC(ORIG.STR, STR.TO.FIND, STR.TO.REPLACE)
*
* Function to REPLACE a string but ignoring case using UniBasic
* Jonathan Smith - August 2023
*
* ORIG.STR = The Original String
* STR.TO.FIND = The string to seacrh for in ORIG.STR
* STR.TO.REPLACE = The string to replace STR.TO.FIND with
*
NEW.STR = ""
ALL.LOWER.ORIG.STR = OCONV(ORIG.STR, "MCL")
ALL.LOWER.STR.TO.FIND = OCONV(STR.TO.FIND, "MCL")
LEN.LOWER.STR.TO.FIND = LEN(ALL.LOWER.STR.TO.FIND)
WORK.ORIG.STR = ORIG.STR
LOOP
   NPOS = INDEX(ALL.LOWER.ORIG.STR, ALL.LOWER.STR.TO.FIND, 1)
UNTIL NOT(NPOS) DO
   NEW.STR := WORK.ORIG.STR[1, NPOS - 1] : STR.TO.REPLACE
   ALL.LOWER.ORIG.STR = ALL.LOWER.ORIG.STR[NPOS + LEN.LOWER.STR.TO.FIND, 9999]
   WORK.ORIG.STR = WORK.ORIG.STR[NPOS + LEN.LOWER.STR.TO.FIND, 9999]
REPEAT
NEW.STR := ALL.LOWER.ORIG.STR
RETURN NEW.STR
END