Question

Need faster way to search record list other than LOCATE

Forum|Forum|1 month ago
January 1, 2026
5 replies
89 views

Nelson Schroth
Participating Frequently

Creating a purge program. First make a past on a stat file that has cust# and date to select appropriate records to purge into a “to purge” array (INSERT “AR”).

Then, read through main file, using a locate on the array to determine if it was one selected. If so, delete it.

Issue is that the main file are from 1-5 records whose IDs begin with the cust#. This file does not have a date to be used for purging. The main file has almost 5M records. The “to purge” array has 50k-150k cust#s in it. The first pass to create the array from the stat file only takes 30 seconds even with 3M+ records. However, reading through the main file, using a LOCATE (“AR”) is really slow.

I would think that locating in the array would be faster than doing one pass on the main file and doing 4M+ reads on the stat file. So, I was wondering if anyone had a better idea that would speed up the process.

Universe v11.3.1.6022 running on AIX 7.2

Brian Speirs
Participating Frequently
Forum|Forum|1 month ago
January 1, 2026

Hi,

I am not sure that I fully understand your description … but here goes.

When you say the first pass on the stat file creates an array, I assume that you mean you are creating a dynamic array - which is effectively just a list.

I don’t understand why you then need to LOCATE customer ids within this list. I would have thought that you could process this list faster using a LOOP … REMOVE .. REPEAT construct.

If the list is not giving you the full @ID to the main file, then have you indexed the main file on the customer id? Then, your initial list can give you the customer id, and then you can use the index to return the @ID’s in the main file and process those.

Finally, deleting records from the main file is a slow process - especially when you are deleting 150K items.

Are you sure that your performance hit is due to your LOCATE process and isn’t just a function of the deletion speed?

Cheers,

Brian

Nelson Schroth
Author
Participating Frequently
Forum|Forum|1 month ago
January 3, 2026

Hi Brian,

I have commented out my delete statements, so I am positive that the slowness is caused by the locates.

We avoid using indexes whenever possible as they complicate file relocation, especially on large files.

The main file are copies of records from multiple files that were purged, with the customer number prefixed on some of them. Example:

50017033
50017033*ARLIST_17327_41785
50017033*XORD-CUST_17327_41785

The stat file is a list of the customer numbers that were purged and the date they were purged. Example:

CNMRG.MERGED.XREF BASE CUST DATE MERGED
54038996 54057265 06/04/22
48749736 46246549 06/10/22
11989379 48093206 06/14/22
50600977 49952624 06/24/22

So the only way to selected records purged before a certain date is by using the stat file. Those ids are what is loaded into the LOCATE array.

Then, the main file has to be read completely through since there may be one or more records in it for the purged customer.

Hope this helps with the details.

Nelson

Brian Speirs
Participating Frequently
Forum|Forum|1 month ago
January 4, 2026

OK. I get the picture now.

I can see two approaches (and no doubt there are more). The first is to index the main file on customer number - which you have already said you don’t want to do. It may still be worthwhile to check how long it takes to index the file just for this purpose - and then you can delete the index after running the process.

The second approach is to use shorter lists of customer ids in which to LOCATE the customer. Take the last digit of the customer number as the “list number”. That will create 10 lists which should each be one tenth the size of the list you are currently using. (Or the last two digits for 100 lists). The lists could be stored in either a dimensioned array (e.g. CUSTLISTS(listno)<1, x>) or a dynamic array (e.g. CUSTLISTS<listno, x>). A downside here is that the lists become @VM delimited which (I believe) is slower than @AM delimited lists.

Cheers,

Brian

John Greeen
Participating Frequently
Forum|Forum|1 month ago
January 4, 2026

Using a secondary index would certainly be the most scalable solution. I suspect the file relocation issue might be resolved with the “AT RELATIVE.PATH” clause of the CREATE.INDEX command.

Some time ago did tests that showed that a linear locate is faster than an ordered locate, especially on large arrays. It’s not intuitive and I’d love someone to prove me wrong but that’s what my testing showed.

You might try building the array using <-1> which will be faster than INSERT AR, and then using LOCATE instead of LOCATE AR.

Nelson Schroth
Author
Participating Frequently
Forum|Forum|1 month ago
January 5, 2026

I am using a dynamic array with 10 attributes.

Whole process ran in under 2 mins. I will try this on a larger production data set and provide more feedback if anything changes.

Thanks for all the feedback.

Recent badge winners

Sign up

Please log in or register:

Welcome to the Rocket Forum!

Please log in or register:

Scanning file for viruses.

This file cannot be downloaded