How to gauge progress of a large SELECT statement in BASIC pgm

Thomas Ludwig
Participating Frequently
Forum|Forum|1 year ago
October 2, 2024

Hello all,

I have a program that analyzes data from a 6.8M record file that does not have an index.

Is there any way to report on the progress of the EXECUTE, as it may take 20mins or more to run?

Nelson

------------------------------
Nelson Schroth
president
C3CompleteShop LLC
Harrison OH US
------------------------------

Hi Nelson,

The first idea that I have for this is the possibility to select a file-variable in basic instead of the file and process the items by yourself.
By selecting the file-variable you can immediately start with processing.

in this case you can print the update-statements by yourself.

What I'm not sure about is if there is a huge performance difference if you process it in this way.

best regards

------------------------------
Thomas Ludwig
System Builder Developer
Rocket Forum Shared Account
------------------------------

Like

+2

Manu Fernandes
Inspiring
Forum|Forum|1 year ago
October 2, 2024

Hi Nelson,

The first idea that I have for this is the possibility to select a file-variable in basic instead of the file and process the items by yourself.
By selecting the file-variable you can immediately start with processing.

in this case you can print the update-statements by yourself.

What I'm not sure about is if there is a huge performance difference if you process it in this way.

best regards

------------------------------
Thomas Ludwig
System Builder Developer
Rocket Forum Shared Account
------------------------------

Hi,

The system does not provide it.

I suggest to write a dummy DICT I-Type 'COUNTER' subroutine which update a common var by increment of 1 and write it somewhere by increment of 100/1000, ... and return 1
then you can have a progress-count but not a progress-percent because none show the total number of record to process.

Use : SELECT file WITH COUNTER = 1 AND ...

SUBROUTINE COUNTER(result)
COMMON INCREMENT,F
result = 1
IF NOT(ASSIGNED(INCREMENT)) THEN INCREMENT =  0; OPEN '&SAVEDLISTS&' TO F ELSE RETURN 
INCREMENT += 1 
IF MOD(INCREMENT,1000) = 0 THEN WRITE INCREMENT ON F,'CURRENTSELECT':@USERNO 
RETURN

by outside (other session) You can known the progress ... CT &SAVEDLSITS& CURRENTSELECTxxx

I hope this help.

------------------------------
Manu Fernandes
------------------------------

Like

Ryan Ladd
Participating Frequently
Forum|Forum|1 year ago
October 2, 2024

Hello all,

I have a program that analyzes data from a 6.8M record file that does not have an index.

Is there any way to report on the progress of the EXECUTE, as it may take 20mins or more to run?

Nelson

------------------------------
Nelson Schroth
president
C3CompleteShop LLC
Harrison OH US
------------------------------

You can do this but it gets a bit messy. I'll try to explain 3 approaches I have used for this type of work.

1. Use LIST.READU EVERY (or smat -g) to try to find the RD locks that the user process is using to access each group of the file that is being processed. you can then use the hex address of the recent RD lock, convert it to DEC, divide by the group size and subtract 1 to know what group in the file that RD lock is for. Then you copare it against the current modulo to get a rough percent. This is hit or miss for me whether I end up catching RD locks showing up or not with this method as it may take multiple captures to find an RD lock. you also need to know exactly what you are looking for if you are on a busy system with lots of users and files.

2. Inspect results in UVTEMP location. If your process is generating a list of records and isn't entirely in memory, the UVTEMP will have relatively recent data in it while it is storing the results in files named select<<pid>>aa, select<<pid>>ab, etc. From there, you can find one of the select<<pid>>... files that has changed recently and look at the record IDs at the end of that list using "last". You can then use the "RECORD" command to find out where in the file that record id lives and compare it with the current modulo of the file to get a rough % complete. If your select has a SORT statement in it, then the results would be stored in files named sort<<pid>>... and you would need to be more careful in which of those files you looked at because it keeps things in sorted order so the last entry in any one file would contain the @ID (along with the value of what is being sorted) of the "largest" sorted item.

3. You can use strace or truss to inspect the uv process from the command line and look at the various kreads that are happening. You can take the hex address listed and just like with the LIST.READU method, convert it to decimal, divide by the group size and subtract 1 to understand what group is being processed at that time. This one is the ugliest solution, but does work and isn't too difficult once you get used to looking at strace/truss output.

Of these 3, I am guessing the 1st one has the best chance at being implemneted by a remote process (if that was what you were thinking) to try to find the status but it doesn't always yield results.

------------------------------
Ryan Ladd
------------------------------

Like

Chuck Stevenson
Participating Frequently
Forum|Forum|1 year ago
October 2, 2024

Hello all,

I have a program that analyzes data from a 6.8M record file that does not have an index.

Is there any way to report on the progress of the EXECUTE, as it may take 20mins or more to run?

Nelson

------------------------------
Nelson Schroth
president
C3CompleteShop LLC
Harrison OH US
------------------------------

You can watch its progress realtime in UVTEMP (see UV CONFIG parameter), especially if the resultant list is large, or you have significant sorting.
Counter subroutine idea: you can write the progress counter to the screen or &PH& log directly. Include a timestamp &/or records/sec rate.
Basic select filevar: I have always been flabbergasted by how quickly EXECUTE 'SELECT ...' runs compared to doing it myself. Even when I need to do selection criteria involving TRANS() I-descriptors, SELECT performance usually wins over BASIC. (How do they do that???)
Basic select filevar does have the advantage of being able to process immediately, rather than waiting for the entire list to finish, then looping through & re-reading the records selected. An alternative to 2 passes on the file is the "SAVING " keyword. If there are only limited info, even calculated data, needed from the file being selected, save that data into the select list as you go:

EXECUTE "SELECT [file] . . . WITH HARD2 . . . SAVING @ID EASY1 HARD2"

LOOP

WHILE READNEXT ID

WHILE READNEXT EASY1

WHILE READNEXT HARD2

. . . process ID, EASY1, HARD2, without rereading the record . . . assuming you're not writing back to it.

REPEAT

That doesn't answer your question about a progress report, but it can significantly reduce processing time. If HARD2 is a complicated calculation needed for the selection or sort criteria (e.g, 2 or 3 layers of TRANS() embedded & or called subroutine), it means you don't have to repeat it in the calling program.

You didn't say UV or UD, but it is much the same in both.

------------------------------
Chuck Stevenson
DBA / SW Developer
Pomeroy
US
------------------------------

Like

+1

Doug Averch
Participating Frequently
Forum|Forum|1 year ago
October 2, 2024

You can watch its progress realtime in UVTEMP (see UV CONFIG parameter), especially if the resultant list is large, or you have significant sorting.
Counter subroutine idea: you can write the progress counter to the screen or &PH& log directly. Include a timestamp &/or records/sec rate.
Basic select filevar: I have always been flabbergasted by how quickly EXECUTE 'SELECT ...' runs compared to doing it myself. Even when I need to do selection criteria involving TRANS() I-descriptors, SELECT performance usually wins over BASIC. (How do they do that???)
Basic select filevar does have the advantage of being able to process immediately, rather than waiting for the entire list to finish, then looping through & re-reading the records selected. An alternative to 2 passes on the file is the "SAVING " keyword. If there are only limited info, even calculated data, needed from the file being selected, save that data into the select list as you go:

EXECUTE "SELECT [file] . . . WITH HARD2 . . . SAVING @ID EASY1 HARD2"

LOOP

WHILE READNEXT ID

WHILE READNEXT EASY1

WHILE READNEXT HARD2

. . . process ID, EASY1, HARD2, without rereading the record . . . assuming you're not writing back to it.

REPEAT

That doesn't answer your question about a progress report, but it can significantly reduce processing time. If HARD2 is a complicated calculation needed for the selection or sort criteria (e.g, 2 or 3 layers of TRANS() embedded & or called subroutine), it means you don't have to repeat it in the calling program.

You didn't say UV or UD, but it is much the same in both.

------------------------------
Chuck Stevenson
DBA / SW Developer
Pomeroy
US
------------------------------

I wrote a Subroutine program to write out a log file as it goes along. So if the file is 100,000 records every 1K the log file gets updated. The log file can be seen on a web monitor or at TCL/ECL by listing the file.

Last auto-saved at: 10-02-2024 10:50

------------------------------
Doug Averch
Owner
U2 Logic
------------------------------

Doug Averch Owner https://www.u2logic.com/

Like

Mark Baldridge
Participating Frequently
Forum|Forum|1 year ago
October 2, 2024

Hello all,

I have a program that analyzes data from a 6.8M record file that does not have an index.

Is there any way to report on the progress of the EXECUTE, as it may take 20mins or more to run?

Nelson

------------------------------
Nelson Schroth
president
C3CompleteShop LLC
Harrison OH US
------------------------------

Take the modulo times the group size and convert to hex. This is the last group address of the file. Find the inode of the file with 'SH -c "ls -li" ' or the STATUS statement in field 10. You know the WHO number of the SELECT process. LIST.READU EVERY will return Device, Inode, Group Address, and Who with the RD group lock.

The reported hexadecimal group address divided by the calculated last group address will provide the percent of the file selected and answer the remaining amount needed to do.

------------------------------
Mark A Baldridge
Principal Consultant
Thought Mirror
Nacogdoches, Texas United States
------------------------------

Like

+2

Manu Fernandes
Inspiring
Forum|Forum|1 year ago
October 2, 2024

Take the modulo times the group size and convert to hex. This is the last group address of the file. Find the inode of the file with 'SH -c "ls -li" ' or the STATUS statement in field 10. You know the WHO number of the SELECT process. LIST.READU EVERY will return Device, Inode, Group Address, and Who with the RD group lock.

The reported hexadecimal group address divided by the calculated last group address will provide the percent of the file selected and answer the remaining amount needed to do.

------------------------------
Mark A Baldridge
Principal Consultant
Thought Mirror
Nacogdoches, Texas United States
------------------------------

Hi mark

Careefull, This 'll not work on uv 12+,group info are no more into LIST.READU.

------------------------------
Manu Fernandes
------------------------------

Like

Chuck Stevenson
Participating Frequently
Forum|Forum|1 year ago
October 2, 2024

Take the modulo times the group size and convert to hex. This is the last group address of the file. Find the inode of the file with 'SH -c "ls -li" ' or the STATUS statement in field 10. You know the WHO number of the SELECT process. LIST.READU EVERY will return Device, Inode, Group Address, and Who with the RD group lock.

The reported hexadecimal group address divided by the calculated last group address will provide the percent of the file selected and answer the remaining amount needed to do.

------------------------------
Mark A Baldridge
Principal Consultant
Thought Mirror
Nacogdoches, Texas United States
------------------------------

Here's how I convert dev & Inode to filename in LIST.READU.

I regularly run ACCOUNT.FILE.STATS which writes to STAT.FILE where I put an index on a concatenation <7> & <8>: FILEDEV: '.': FILEINODE[10].

Then a wrapper around LIST.READU converts the device & Inode columns to a filename.

LOOP WHILE MORE
LINE = REMOVE( LRDU, MORE )
TRIMMED = TRIM( LINE )
DEV = TRIMMED[ ' ', 1, 1 ]
INODE = TRIMMED[ ' ', 2, 1 ]
IF LEN( DEV ) ! LEN( INODE ) THEN
SELECTINDEX 'DEV.INODE', (DEV: '.': INODE[10]) FROM F.STAT TO 9
READNEXT STAT.ID FROM 9 THEN
READV FILENAME FROM F.STAT, STAT.ID, 1 THEN
LINE[1,31] = FILENAME '31.L#31'

E.g.:

1922279711 1125899906975007 0 -19 61 RU 3376 CALIX

becomes

QM.CONTROL..................... 0 -19 61 RU 3376 CALIX

(I don't care about account or path, but those are in STAT.FILE record, too.

UV11.4. It will need tweaking at 12.2.

I was going to mention FILE.STAT has the modulo, etc., so Mark's calculation could be rolled into this, too, but that isn't going to work at uv12+)

------------------------------
Chuck Stevenson
DBA / SW Developer
Pomeroy
US
------------------------------

Like

Brian Leach
Participating Frequently
Forum|Forum|1 year ago
October 3, 2024

Hello all,

I have a program that analyzes data from a 6.8M record file that does not have an index.

Is there any way to report on the progress of the EXECUTE, as it may take 20mins or more to run?

Nelson

------------------------------
Nelson Schroth
president
C3CompleteShop LLC
Harrison OH US
------------------------------

Extending Ryan and Chuck's idea for a BASIC select though it's still very clunky:

The RECORD command tells you what group a given ID hashes to. A BASIC select will iterate through group by group based on the primary group (and traverse its overflow) so if you know the current modulus (e.g. from STATUS) then by running RECORD every n'th id (and choose a suitable number to avoid the overheads) would give you a pretty good idea of how far you are. The trick would be choosing a suitable interval but you can get clever with that - if you get two checks with the same percentage double the interval etc..

------------------------------
Brian Leach
Director
Brian Leach Consulting
Chipping Norton GB
------------------------------

Like

M

Mark Mevissen
New Participant
Forum|Forum|1 year ago
October 4, 2024

Extending Ryan and Chuck's idea for a BASIC select though it's still very clunky:

The RECORD command tells you what group a given ID hashes to. A BASIC select will iterate through group by group based on the primary group (and traverse its overflow) so if you know the current modulus (e.g. from STATUS) then by running RECORD every n'th id (and choose a suitable number to avoid the overheads) would give you a pretty good idea of how far you are. The trick would be choosing a suitable interval but you can get clever with that - if you get two checks with the same percentage double the interval etc..

------------------------------
Brian Leach
Director
Brian Leach Consulting
Chipping Norton GB
------------------------------

Brian, your solution as to a progress monitor for a BASIC select may be clunky but it was also clever.

------------------------------
Mark Mevissen
IT Manager
Griffiths Corp
Minneapolis MN US
------------------------------

Like

+4

Gregor Scott
Participating Frequently
Forum|Forum|1 year ago
October 6, 2024

Extending Ryan and Chuck's idea for a BASIC select though it's still very clunky:

The RECORD command tells you what group a given ID hashes to. A BASIC select will iterate through group by group based on the primary group (and traverse its overflow) so if you know the current modulus (e.g. from STATUS) then by running RECORD every n'th id (and choose a suitable number to avoid the overheads) would give you a pretty good idea of how far you are. The trick would be choosing a suitable interval but you can get clever with that - if you get two checks with the same percentage double the interval etc..

------------------------------
Brian Leach
Director
Brian Leach Consulting
Chipping Norton GB
------------------------------

On UniVerse the @SELECTED variable, when combined with the BASIC SELECT verb, reports the number of items in each group.

So it is possible to know how many groups of the file have been processed using the BASIC SELECT & READNEXT statements.

In the absence of a total item count, the iterations through the groups will give a reasonable guide to processing progress.

prompt ":"
OPEN "VOC" TO FV.VOC ELSE STOPM "VOC IS BAD"
CRT "VOC modulo = ":fileinfo(FV.VOC,5)
GRP.QTY = ''
GROUP.NO = 0
CRT "GROUP.NO""L#10":" ":"@SELECTED"
b.CONTINUE=@TRUE
SELECT FV.VOC TO LV.VOC
LOOP
GROUP.NO += 1
GRP.QTY = @SELECTED
crt GROUP.NO "R#10": " ": GRP.QTY "R#10"
for GRP.REC = 1 to GRP.QTY
READNEXT ID FROM LV.VOC ELSE
b.CONTINUE = @FALSE
EXIT
end
if GRP.REC > 1 then crt ",":
crt ID:
next GRP.REC
if GRP.QTY then crt
input g1
while b.CONTINUE do
if mod(GROUP.NO,@CRTHIGH-1) = 0 then input fred
REPEAT