Parsing and matching a string against a pattern

+1

John Jenkins
Participating Frequently
Forum|Forum|2 years ago
March 22, 2023

Hello everyone,

I am trying to parse a string against a pattern and return a delimited list of tokens. OpenQM has a Parse() function but there is no equivalent in Universe that I can see. Has anyone done something similar?

For instance, if I have something like below:

string = "Give me 97 bananas"

tokens = Parse(string, "0X0N0X", Char(254))

I would like tokens variable to contain "Give me ":Char(254):"97":Char(254):" bananas"

Any help or suggestions would be appreciates.

------------------------------
Joseph von Arx
Software Developer
Data Management Associates Inc DMA
Cincinnati OH US
------------------------------

Joseph,

Please take a look at the MATCHES clause - - e.g.

The pattern matching operator, the MATCH operator, and its synonym, the MATCHES operator, compares a string expression to a pattern. The syntax for a pattern match expression is: string MATCH[ES] pattern

The pattern is a general description of the format of the string. It can consist of text or the special characters X, A, and N preceded by an integer used as a repeating factor. X stands for any characters, A stands for any alphabetic characters, and N stands for any numeric characters. For example, 3N
is the pattern for character strings made up of three numeric characters. If the repeating factor is zero, any number of characters will match the string. For example, 0A is the pattern for any number of alphabetic characters, including none. If an NLS locale is defined, its associated definitions of alphabetic and numeric determine the pattern matching.

An empty string matches the following patterns: "0A", "0X", "0N", "...", "", '', or \\\\.

If using NLS then there are additional considerations as MATCHES is a byte-level operation - please see the NLS guide for additional details.

Regards

JJ

------------------------------
John Jenkins
Thame, Oxfordshire
------------------------------

JJ

Like

A

Andrew Milne
Participating Frequently
Forum|Forum|2 years ago
March 23, 2023

Hello everyone,

I am trying to parse a string against a pattern and return a delimited list of tokens. OpenQM has a Parse() function but there is no equivalent in Universe that I can see. Has anyone done something similar?

For instance, if I have something like below:

string = "Give me 97 bananas"

tokens = Parse(string, "0X0N0X", Char(254))

I would like tokens variable to contain "Give me ":Char(254):"97":Char(254):" bananas"

Any help or suggestions would be appreciates.

------------------------------
Joseph von Arx
Software Developer
Data Management Associates Inc DMA
Cincinnati OH US
------------------------------

Hi Joseph,

this is not "pattern matching" but as something simple that will work with an I Descriptor or in a program will work with your string,

It may give you a start to play with

F1 = "Give Me 97 Bananas"

FIELD(OCONV(F1,"MC/N")," ",1,2):@FM:OCONV(F1,"MCN"):@FM:FIELD(OCONV(F1,"MC/N")," ",4,1)

where the OCONV MC/N extracts only the TEXT and OCONV MCN extracts only the Numbers

I hope this helps

Thanks

Andy

------------------------------
Andrew Milne
Business Systems Manager
Potter and Moore Innovations
Peterborough, Cambs GB
------------------------------

Like

Joseph von Arx
Author
Participating Frequently
Forum|Forum|2 years ago
March 23, 2023

Hi Joseph,

this is not "pattern matching" but as something simple that will work with an I Descriptor or in a program will work with your string,

It may give you a start to play with

F1 = "Give Me 97 Bananas"

FIELD(OCONV(F1,"MC/N")," ",1,2):@FM:OCONV(F1,"MCN"):@FM:FIELD(OCONV(F1,"MC/N")," ",4,1)

where the OCONV MC/N extracts only the TEXT and OCONV MCN extracts only the Numbers

I hope this helps

Thanks

Andy

------------------------------
Andrew Milne
Business Systems Manager
Potter and Moore Innovations
Peterborough, Cambs GB
------------------------------

That's not exactly going to be helpful because I am looking to be using this functionality in two different programs. One is a file search routine by pattern (per line) and the other is an editor replacement to do searches by pattern. Essentially, I need to be able to have a user enter one or more of the patterns below and find instances that match and replace a portion of the found string.

... Zero or more characters of any type

0X Zero or more characters of any type

nX Exactly n characters of any type

n-mX Between n and m characters of any type

0A Zero or more alphabetic characters

nA Exactly n alphabetic characters

n-mA Between n and m alphabetic characters

0N Zero or more numeric characters

nN Exactly n numeric characters

n-mN Between n and m numeric characters

"string" A literal string which must match exactly.

For example, I use a search and replace command to search for every quoted string that contains the word "display" and then only change that word and ignore other instances of "display" that are not quoted. I would use a pattern like 0X'"'0X'display'0X'"'0X or something like that. Then I the replace part of my command would say replace element 4 of the string with "foobar". This is like the CM (Change Match) command in the AE Editor CM/0X'"'0X'display'0X'"'0X/4/foobar so it is like pattern matching and tokenization.

------------------------------
Joseph von Arx
Software Developer
Data Management Associates Inc DMA
Cincinnati OH US
------------------------------

Like

A

AndyBaum
New Participant
Forum|Forum|2 years ago
March 23, 2023

That's not exactly going to be helpful because I am looking to be using this functionality in two different programs. One is a file search routine by pattern (per line) and the other is an editor replacement to do searches by pattern. Essentially, I need to be able to have a user enter one or more of the patterns below and find instances that match and replace a portion of the found string.

... Zero or more characters of any type

0X Zero or more characters of any type

nX Exactly n characters of any type

n-mX Between n and m characters of any type

0A Zero or more alphabetic characters

nA Exactly n alphabetic characters

n-mA Between n and m alphabetic characters

0N Zero or more numeric characters

nN Exactly n numeric characters

n-mN Between n and m numeric characters

"string" A literal string which must match exactly.

For example, I use a search and replace command to search for every quoted string that contains the word "display" and then only change that word and ignore other instances of "display" that are not quoted. I would use a pattern like 0X'"'0X'display'0X'"'0X or something like that. Then I the replace part of my command would say replace element 4 of the string with "foobar". This is like the CM (Change Match) command in the AE Editor CM/0X'"'0X'display'0X'"'0X/4/foobar so it is like pattern matching and tokenization.

------------------------------
Joseph von Arx
Software Developer
Data Management Associates Inc DMA
Cincinnati OH US
------------------------------

Not quite the same but you should be able to write a function using
MATCHFIELD to parse the string

On Thu, 23 Mar 2023 at 12:29, Joseph von Arx via Rocket Forum <
Mail@forum.rocketsoftware.com> wrote:

> That's not exactly going to be helpful because I am looking to be using
> this functionality in two different programs. One is a file search...
> Invite your colleagues to join the Rocket Forum and grow our expert
> network.
> ------------------------------
> [image: Rocket Software]
> Rocket U2 | UniVerse & UniData
>
> Post New Message Online
>
> Invite your colleagues to join the Rocket Forum and grow our expert
> network. Share this link.
> Re: Parsing and matching a string against a pattern
>
> Reply to Group Online
> Reply
> to Group
>
> [image: Joseph von Arx]
>
> Mar 23, 2023 8:29 AM
> Joseph von Arx
>
>
> That's not exactly going to be helpful because I am looking to be using
> this functionality in two different programs. One is a file search routine
> by pattern (per line) and the other is an editor replacement to do searches
> by pattern. Essentially, I need to be able to have a user enter one or
> more of the patterns below and find instances that match and replace a
> portion of the found string.
>
> ... Zero or more characters of any type
>
> 0X Zero or more characters of any type
>
> nX Exactly n characters of any type
>
> n-mX Between n and m characters of any type
>
> 0A Zero or more alphabetic characters
>
> nA Exactly n alphabetic characters
>
> n-mA Between n and m alphabetic characters
>
> 0N Zero or more numeric characters
>
> nN Exactly n numeric characters
>
> n-mN Between n and m numeric characters
>
> "string" A literal string which must match exactly.
>
> For example, I use a search and replace command to search for every quoted
> string that contains the word "display" and then only change that word and
> ignore other instances of "display" that are not quoted. I would use a
> pattern like 0X'"'0X'display'0X'"'0X or something like that. Then I the
> replace part of my command would say replace element 4 of the string with
> "foobar". This is like the CM (Change Match) command in the AE Editor
> CM/0X'"'0X'display'0X'"'0X/4/foobar so it is like pattern matching and
> tokenization.
>
>
>
> ------------------------------
> Joseph von Arx
> Software Developer
> Data Management Associates Inc DMA
> Cincinnati OH US
> ------------------------------
> *Reply to Group Online
> *
> *View Thread
> *
> *Forward
> *
> *Flag as Inappropriate
> *
> *Post New Message Online
> *
>
>

Like

Joseph von Arx
Author
Participating Frequently
Forum|Forum|2 years ago
March 29, 2023

Joseph,

Please take a look at the MATCHES clause - - e.g.

The pattern matching operator, the MATCH operator, and its synonym, the MATCHES operator, compares a string expression to a pattern. The syntax for a pattern match expression is: string MATCH[ES] pattern

The pattern is a general description of the format of the string. It can consist of text or the special characters X, A, and N preceded by an integer used as a repeating factor. X stands for any characters, A stands for any alphabetic characters, and N stands for any numeric characters. For example, 3N
is the pattern for character strings made up of three numeric characters. If the repeating factor is zero, any number of characters will match the string. For example, 0A is the pattern for any number of alphabetic characters, including none. If an NLS locale is defined, its associated definitions of alphabetic and numeric determine the pattern matching.

An empty string matches the following patterns: "0A", "0X", "0N", "...", "", '', or \\\\.

If using NLS then there are additional considerations as MATCHES is a byte-level operation - please see the NLS guide for additional details.

Regards

JJ

------------------------------
John Jenkins
Thame, Oxfordshire
------------------------------

John,

I am already familiar with and use the MATCHES clause all the time. I am trying to take a user-entered pattern and match it to a string and parse out each element of the pattern with the string. So, if I have a string like "* found on lines 12345-12353 - Some modification made", and a pattern "0X0N'-'0N0X" I want a return variable containing 5 elements. Another example would be to find/replace a portion of a version number within a source program, etcetera.

1 = "* found on lines "

2 = "12345"

3 = "-"

4 = "12353"

5 = "- Some modification made"

I can't believe Universe doesn't already have something like this, but this string tokenization helps me first to know that I have a total of 5 elements, so if my user says replace element 8, I can quickly give an error message saying out of bounds. There are other reasons I'm looking to tokenize the string based on the pattern, like syntax highlighting of program source within an editor as well as advanced find/replace functionality.

I can use the MATCHFIELD subroutine as Andy Baum suggested, but it still does not fit all of my needs by itself. I have a basic subroutine that works for simple patterns, but I'm having difficulty with the 0X pattern element when it follows another element, like 0N0X.

------------------------------
Joseph von Arx
Software Developer
Data Management Associates Inc DMA
Cincinnati OH US
------------------------------

Like

+4

Jonathan Smith
Rocketeer
Forum|Forum|2 years ago
March 29, 2023

Hello everyone,

I am trying to parse a string against a pattern and return a delimited list of tokens. OpenQM has a Parse() function but there is no equivalent in Universe that I can see. Has anyone done something similar?

For instance, if I have something like below:

string = "Give me 97 bananas"

tokens = Parse(string, "0X0N0X", Char(254))

I would like tokens variable to contain "Give me ":Char(254):"97":Char(254):" bananas"

Any help or suggestions would be appreciates.

------------------------------
Joseph von Arx
Software Developer
Data Management Associates Inc DMA
Cincinnati OH US
------------------------------

Joseph,

I wrote the following program in UniData and it appeared to do what you needed

As it produced the following output

ORIG.STRING = "* found on lines 12345-12353 - Some modification made"
IF ORIG.STRING MATCHES "0X0N'-'0N0X" THEN
CRT "First Match Test Passed"
GOSUB L100.BREAK.STRING
END ELSE
CRT "First Match Test Failed"
END
STOP
L100.BREAK.STRING:
CRT
FOR PARSE.NXT = 1 TO 5
P1 = MATCHFIELD(ORIG.STRING,"0X0N'-'0N0X",PARSE.NXT)
CRT PARSE.NXT : "=" :
CRT DQUOTE(P1)
NEXT PARSE.NXT
CRT
RETURN
END

First Match Test Passed

1="* found on lines "
2="12345"
3="-"
4="12353"
5=" - Some modification made"

Which did exactally what you needed (I think)

I moved the code over to UniVerse and it didn't return the same result it produced

1="* found on lines 12345-12353 "
2=""
3="-"
4=""
5=" Some modification made"

I then tried using the $OPTIONS PIOPEN.MATCHFIELD at the top of code in UniVerse and it then produced

1="* found on lines "
2="12345"
3="-"
4=""
5="12353 - Some modification made"

Which is better but still not the same as UniData and is still not what you want.

If I come up with a simple way for UniVerse I will let you know.

------------------------------
Jonathan Smith
UniData ATS
Rocket Support
------------------------------

Like

Joseph von Arx
Author
Participating Frequently
Forum|Forum|2 years ago
March 29, 2023

Joseph,

I wrote the following program in UniData and it appeared to do what you needed

As it produced the following output

ORIG.STRING = "* found on lines 12345-12353 - Some modification made"
IF ORIG.STRING MATCHES "0X0N'-'0N0X" THEN
CRT "First Match Test Passed"
GOSUB L100.BREAK.STRING
END ELSE
CRT "First Match Test Failed"
END
STOP
L100.BREAK.STRING:
CRT
FOR PARSE.NXT = 1 TO 5
P1 = MATCHFIELD(ORIG.STRING,"0X0N'-'0N0X",PARSE.NXT)
CRT PARSE.NXT : "=" :
CRT DQUOTE(P1)
NEXT PARSE.NXT
CRT
RETURN
END

First Match Test Passed

1="* found on lines "
2="12345"
3="-"
4="12353"
5=" - Some modification made"

Which did exactally what you needed (I think)

I moved the code over to UniVerse and it didn't return the same result it produced

1="* found on lines 12345-12353 "
2=""
3="-"
4=""
5=" Some modification made"

I then tried using the $OPTIONS PIOPEN.MATCHFIELD at the top of code in UniVerse and it then produced

1="* found on lines "
2="12345"
3="-"
4=""
5="12353 - Some modification made"

Which is better but still not the same as UniData and is still not what you want.

If I come up with a simple way for UniVerse I will let you know.

------------------------------
Jonathan Smith
UniData ATS
Rocket Support
------------------------------

Jonathan,

Yes, thanks. I had run a program very similar to this and found that issue on Universe. I was not aware of the PIOPEN.MATCHFIELD option, so I appreciate your message. You are correct that it is still not what I am looking for, but I could get it to work if I change the pattern a little. I'll look into the PIOPEN.MATCHFIELD option further. Thanks for the suggestion.

------------------------------
Joseph von Arx
Software Developer
Data Management Associates Inc DMA
Cincinnati OH US
------------------------------

Like

+4

Jonathan Smith
Rocketeer
Forum|Forum|2 years ago
March 31, 2023

Hello everyone,

I am trying to parse a string against a pattern and return a delimited list of tokens. OpenQM has a Parse() function but there is no equivalent in Universe that I can see. Has anyone done something similar?

For instance, if I have something like below:

string = "Give me 97 bananas"

tokens = Parse(string, "0X0N0X", Char(254))

I would like tokens variable to contain "Give me ":Char(254):"97":Char(254):" bananas"

Any help or suggestions would be appreciates.

------------------------------
Joseph von Arx
Software Developer
Data Management Associates Inc DMA
Cincinnati OH US
------------------------------

Hi Joseph,

I retested on UV 11.3.5 and my program works

*--: P
001: $OPTIONS PIOPEN.MATCHFIELD
002: ORIG.STRING = "* found on lines 12345-12353 - Some modification made"
003: IF ORIG.STRING MATCHES "0X0N'-'0N0X" THEN
004: CRT "First Match Test Passed"
005: GOSUB L100.BREAK.STRING
006: END ELSE
007: CRT "First Match Test Failed"
008: END
009: STOP
010: L100.BREAK.STRING:
011: CRT
012: FOR PARSE.NXT = 1 TO 5
013: P1 = MATCHFIELD(ORIG.STRING,"0X0N'-'0N0X",PARSE.NXT)
014: CRT PARSE.NXT : "=" :
015: CRT DQUOTE(P1)
016: NEXT PARSE.NXT
017: CRT
018: RETURN
019: END
Bottom.
*--:

*--: FIBR
Filed "PATTERN.TEST" in file "BP" unchanged.
Compiling: Source = 'BP/PATTERN.TEST', Object = 'BP.O/PATTERN.TEST'
**

Compilation Complete.
First Match Test Passed

1="* found on lines "
2="12345"
3="-"
4="12353"
5=" - Some modification made"

>

------------------------------
Jonathan Smith
UniData ATS
Rocket Support
------------------------------

Like

Joseph von Arx
Author
Participating Frequently
Forum|Forum|2 years ago
March 31, 2023

Hi Joseph,

I retested on UV 11.3.5 and my program works

*--: P
001: $OPTIONS PIOPEN.MATCHFIELD
002: ORIG.STRING = "* found on lines 12345-12353 - Some modification made"
003: IF ORIG.STRING MATCHES "0X0N'-'0N0X" THEN
004: CRT "First Match Test Passed"
005: GOSUB L100.BREAK.STRING
006: END ELSE
007: CRT "First Match Test Failed"
008: END
009: STOP
010: L100.BREAK.STRING:
011: CRT
012: FOR PARSE.NXT = 1 TO 5
013: P1 = MATCHFIELD(ORIG.STRING,"0X0N'-'0N0X",PARSE.NXT)
014: CRT PARSE.NXT : "=" :
015: CRT DQUOTE(P1)
016: NEXT PARSE.NXT
017: CRT
018: RETURN
019: END
Bottom.
*--:

*--: FIBR
Filed "PATTERN.TEST" in file "BP" unchanged.
Compiling: Source = 'BP/PATTERN.TEST', Object = 'BP.O/PATTERN.TEST'
**

Compilation Complete.
First Match Test Passed

1="* found on lines "
2="12345"
3="-"
4="12353"
5=" - Some modification made"

>

------------------------------
Jonathan Smith
UniData ATS
Rocket Support
------------------------------

I do not get the same result on Universe 11.3.4 so there appears to be a discrepancy/bug in this version of Universe.

First Match Test Passed

1="* found on lines "
2="12345"
3="-"
4=""
5="12353 - Some modification made"

------------------------------
Joseph von Arx
Software Developer
Data Management Associates Inc DMA
Cincinnati OH US
------------------------------

Like

+4

Jonathan Smith
Rocketeer
Forum|Forum|2 years ago
March 31, 2023

Hello everyone,

I am trying to parse a string against a pattern and return a delimited list of tokens. OpenQM has a Parse() function but there is no equivalent in Universe that I can see. Has anyone done something similar?

For instance, if I have something like below:

string = "Give me 97 bananas"

tokens = Parse(string, "0X0N0X", Char(254))

I would like tokens variable to contain "Give me ":Char(254):"97":Char(254):" bananas"

Any help or suggestions would be appreciates.

------------------------------
Joseph von Arx
Software Developer
Data Management Associates Inc DMA
Cincinnati OH US
------------------------------

Joseph,

Myself and Neil have got to the bottom of this ... for your example to work you need to have PI_MATCHFIELD set to 1 in the uvconfig file, the $OPTIONS makes a difference when PI_MATCHFIELD is set to 0 but it is still incorrect. When we looked at the test machines we were using, one had it turned on and another did not. Once it was set the program behaved as expected.

So the answer is set PI_MATCHFIELD to 1

Thanks,

------------------------------
Jonathan Smith
UniData ATS
Rocket Support
------------------------------

Like

Joseph von Arx
Author
Participating Frequently
Forum|Forum|2 years ago
March 31, 2023

Joseph,

Myself and Neil have got to the bottom of this ... for your example to work you need to have PI_MATCHFIELD set to 1 in the uvconfig file, the $OPTIONS makes a difference when PI_MATCHFIELD is set to 0 but it is still incorrect. When we looked at the test machines we were using, one had it turned on and another did not. Once it was set the program behaved as expected.

So the answer is set PI_MATCHFIELD to 1

Thanks,

------------------------------
Jonathan Smith
UniData ATS
Rocket Support
------------------------------

Thanks for that information. This makes me consider programming my own MATCHES function so I know exactly how it works on every system. I may not be able to change config settings.

------------------------------
Joseph von Arx
Software Developer
Data Management Associates Inc DMA
Cincinnati OH US
------------------------------