Rocket U2 | UniVerse & UniData

 View Only

 PORT.STATUS performance impacts

Ian McGowan's profile image
Ian McGowan posted 08-01-2024 01:22

For a conversion project at a customer with 4000 custom programs written over 30 years we're trying to get a handle on which programs are actively used.  One idea that's been floated is running a monitoring program as root from the demo account that does the pseudo code below.

By definition we'll be running this in production for a month or three, and there's some justified nervousness about any potential performance impacts.  I don't think the monitoring program will be a performance hog itself, but not sure if doing PORT.STATUS on the probed program has any effect - does it "pause" the probed program so the stack collected is consistent for example?  We're not expecting to get 100% coverage from this, but even 80% would be a huge help.

Anyone ever done something like this for real?  Any war stories or hard-won words of wisdom to share?  Thanks for reading!

LOOP
  EXECUTE 'LISTUSER'
  LOOP FOR EACH ROW IN LISTUSER
    RUN "PORT.STATUS PID <ROW.PID> CALL.STACK
    PARSE THE CALL STACK, EXTRACT PROGRAM NAMES
    WRITE NEW PROGRAM NAMES TO A LOG FILE
  REPEAT
REPEAT
Manu Fernandes's profile image
Manu Fernandes

Hi

Sure it's heavy for the system, it set locks, semaphore, shares private mem access, ... .

And not the best way to do what you need.  

At first, I suggest to GLOBAL CATSLOG your prog/subr and use MAKE.MAP.FILE then you can list &MAP& to read how much a code is used. 

Then, if you need more detail, I suggest to write a little subr and you add it at the begining of each prog/subr. A little refactoring  prog can insert the call just after the subroutine/program statement.

Remember, system(9001) return the current call stack.

I hope this help

Manu

Manu Fernandes's profile image
Manu Fernandes

Another sugg is to create one file (eventFile) per @userno to store the call/exec event. 

Then you merge all evenFile via Disteibuted file (define.df).

0.02€

Manu

John Jenkins's profile image
John Jenkins

Ian,

As well as the MAP file (as already suggested), if you want to track usage of programs at ECL you could make the the VOC entries Remote Verbs and insert a Security Subroutine as a precursor to each ECL invocation for logging purposes. The same one small security subroutine could be re-used for all programs with a VOC entry,

A Security Subroutine can do whatever it wants - e.g. with @SENTENCE - and then just return a code that allows continued execution, I'd suggest updating a log file with the verb/program name and a counter.

Regards

JJ

Brian Paige's profile image
Brian Paige
Hi Ian!
 
Some of my suggestions would be system and configuration specific.  For example, we run UV on AIX, using Ideal flavored accounts and when we catalog programs, they’re written (and run from ) GLOBAL.CATDIR ($uvhome/catdir).  At the unix level, the date stamp on the object code in catdir is the last time it was used.  We use that sometime to tell when/if a program is being used, but that does only work for BASIC programs (not PROCs or Paragraphs or other types of processes).
 
As a warning, be VERY careful running PORT.STATUS repeatedly.  I haven’t checked if it’s still an issue in more recent versions of UV, but it used to be that if you ran PORT.STATUS 10 times against a process that wasn’t moving (like a user just sitting in a screen, but not doing anything), it would terminate that process.
 
Brian
Mike Young's profile image
Mike Young

I presume the customer is running the programs from a menu and not directly from ECL.

Would it not be easiest to modify the menu routine to log the program prior to execution?

Ian McGowan's profile image
Ian McGowan
Thanks for all the helpful responses!

Manu, "locks, semaphores and private access" and being heavy for the system were exactly what I was worried about. Thanks for the confirmation.  And also the suggestion to use a separate file for each user - totally makes sense, though distributed files are new to me.

John/Brian, unfortunately we're not able to make sweeping changes to either the code, or how it's cataloged. This is a production application, and minimal changes are allowed.  Brian, appreciate the note of caution about PORT.STATUS terminating processes - that's exactly the kind of thing I was worried about.

Mike, yep - most programs are run from a menu or a home-grown job scheduler, both of which leave an audit trail. But those programs call/execute other programs, which don't have the same trail (and there are lots of dead paths in the 30-year-old code so static analysis is approximate at best). There are other programs that are run from the command line, or via ObjectCall (which I'm aware is deprecated in Unidata 8.1+) and there's no easy trail for those.

On a positive note, the custom programs we're converting for the new system will all be recompiled and tested individually, so we're planning to add "$INCLUDE BP PROGRAM.LOG.START" to the top of every program to a) pull in some common things that every program will need, and b) log the start of every program, with information about the user running it, the call stack etc. Since a subroutine might be called thousands of times a minute, there's some thought to making that logging as lightweight as possible, but still useful for auditing.
Joe Goldthwaite's profile image
Joe Goldthwaite

I also wanted to track what object code is being used to help find out what code is in use and what is not.  This only works if you're using Linux as your host os.

For my first attempt was to use the last access time on the object code file. That number looked reasonable to me but when I sent out a list object code files that hadn't been accessed in the last few years some of the other developers swore that those programs were actually in constant use. Anyway, that's not on all file system types and lots of systems disable updating it for performance reasons.

The method I finally took was to use Linux's auditctl service to create log entries whenever one of the object code programs is accessed. 

In the /etc/audit/rules.d folder I added a file called objectcode_run.rules.  The file looks like this:

root:/etc/audit/rules.d$ cat objectcode_run.rules
-w /objectcode -p r -F exe=/usr/uv/bin/uvsh -k "objectcode_run"
-w /objectcode -p r -F exe=/usr/uv/bin/uvdlssh -k "objectcode_run"

Next you would tell auditctl to reload the rules: "augenrules --load".

This  rule creates an entry in the audit log whenever uvsh or uvdlssh accesses a file in the /objectcode folder or any of it's sub-folders. On my system this is all I needed to do because I've put all the object code files as sub-folders of /objectcode.  If  you have multiple files in different accounts you could add separate rules for each one. You could also add any proc or paragraph files to the audit rules if you wanted to track those.

I don't think I need the uvdlssh rule.  That's the dll you run to activate shared licensing but I think it just runs uvsh. I dont' think I saw any log entries with the uvdlssh file.

You end up with a bunch of log entries. You can use the "ausearch -k objectcode_run -i -ts mm/dd/yy hh:mm:ss" to list all the log entries after a specific time and date. You get a bunch of lines that look like this:

----
type=PROCTITLE msg=audit(08/01/2024 16:15:02.221:737983) : proctitle=uvsh EDI.SAP.DRIVER
type=PATH msg=audit(08/01/2024 16:15:02.221:737983) : item=0 name=/objectcode/EDI.SAP.BP.O/EDI.READ.947.SUB inode=272169 dev=fd:04 mode=file,660 uid=jgold ogid=bars rdev=00:00 obj
type=NORMAL cap_fp=none cap_fi=none cap_fe=0 cap_fver=0
type=CWD msg=audit(08/01/2024 16:15:02.221:737983) :  cwd=/accts/SAP.WHS
type=SYSCALL msg=audit(08/01/2024 16:15:02.221:737983) : arch=x86_64 syscall=open success=yes exit=9 a0=0x7ffdc5f96f70 a1=O_RDONLY a2=0x1b6 a3=0x24 items=1 ppid=56894 pid=56909 auid=root uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=79446 comm=uvsh exe=/usr/uv/bin/uvsh key="objectcode_run"

Next I wrote a python program that executes the ausearch command and parses the results into an sqlite3 database file. I keep track of the last date processed so I can list just the entries that happen after that date. The log can be huge but specifying a starting date speeds it up a lot.

It's been running on our production system since July 23rd. So far it shows 376 programs have been accessed. There are 6,895 compiled programs in that folder so most of that is not being used on a regular basis.  I think it might take a year or so before we get a complete picture.

I chose python to parse the ausearch results because it's good at it. I chose sqlite3 to store the results because it's included with python and it makes it easy to view the results. You could put the results in a mvdb database file if you want. I didn't want to use a Universe license and I'm pretty good with sql so I just went with sqlite3.

This adds more entries to the audit logs. They're set up to rotate to new files when they get to a certain size and only keep a certain number of log files. The result of this would be that you're not keeping things as long unless you bump up how large and how many log files it keeps. I haven't had any performance problems but our system doesn't seem to suffer from that at all anyway.

I can send you my notes and the python program if you're interested. I'm not recommending this as the best way to do it. It's just what worked for me.

Ian McGowan's profile image
Ian McGowan

Joe, that's a super-interesting idea - use OS level audit tools to track when object files (_*) are opened by a process.  It's a coincidence - we also have about 4000 programs to review, but think that the active list is somewhere in the middle of the 500-800 range, based on what we can find from menus, job schedulers and other sources.

The client is on AIX now, moving to Linux, but AIX also has excellent audit tools (as you'd expect from IBM ;_)  A quick Q/A with ChatGPT turned up information on the audit subsystem [1] and suggestions on how to track just file_open for files that match _*.  Thanks so much for the idea!

[1] https://www.ibm.com/support/pages/aix-audit-audit-subsystem-aix

Joe Goldthwaite's profile image
Joe Goldthwaite

You're welcome. Let me know if you come up with something.

Mark Copp's profile image
Mark Copp

We have been monitoring our object files for a number of years now using the UNIX Access Time, reporting each object file that has been access within the previous hour gives us a great insight into not only what's actively being used but also when, working day/overnight batch/weekly/monthly/yearly etc. 

One word of caution is that anything accessing/reading (copying/backing up etc.) the Object files will update the UNIX access time, this does not happen that often with us as we backup off of the production servers using snapshots, and we can normally identify these as all files within the directory are accessed within the monitored period so we discount them (which could mean missing a rarely access Object file!)

Felix Stellman's profile image
Felix Stellman

we have noticed several undocumented features with PORT.STATUS that you need to be aware of especially when running PORT.STATUS repeatedly.  If you are doing running PORT.STAUS and it encounters a "stuck" process it can cause the process to move by pushing the process past the internal lock that is set basically skipping past the set internal lock.  The conditions that allow this to happen are very rare, the lock isn't a normal uniVerse lock that you will be setting, it has something to with unix code pending waiting on a condition.

Be aware that if you are running PORT.STATUS and a second copy of the code starts it's possibly going to run into the 1st PORT.STATUS.  uniVerse doesn't allow 2 to run at the same time.  i would run a single PORT.STATUS and save the output to a file, then parse the file.  Not sure how to do that with using the "CALL.STACK"  option (LAYER.STACK) , but what you are trying to do is avoid the reputative PORT.STATUS executions.  interesting problem.  we only have 1200 programs in production.  we do have a logging process we run, but's not for every program  (Subroutines are excluded).

Ian McGowan's profile image
Ian McGowan

Felix, thanks for the further confirmation about PORT.STATUS.   It seems like a generally bad idea to be mis-using it to aggressively poll running programs.  I'm still trying to get the Unix team to enable AIX file logging on files that match _* per Joe's excellent suggestion.  It seems like the perfect solution to the problem - use the OS auditing tools that are built to do this kind of thing.

John Jenkins's profile image
John Jenkins

All,

If you need to aggressively block concurrent use of PORT.STATUS, consider making it a remote verb with a 'security subroutine' that uses a 'soft lock' on a control file based on User Number or PID. The soft lock can be manually cleared by deleting or editing editing the lock record(s)  in the file if needed and it allows a controlled 'gateway' with optional timeouts to the command.

Regards

JJ

Chuck Stevenson's profile image
Chuck Stevenson

Basic Profiling was introduced at UV11.4 & UV12.2.

IIRC UD and PIOpen had/have something similar.  I think it can tell you which lines of code are being executed & how often.   The UV one claims to have "minimal performance impact."   I find that hard to swallow,  but with today's hardware & being able to bump performance on virtual machines, it might not be dangerous.

Here's the quick blurb from the Release Notes:

"BASIC Profiling
With applications that have millions of lines of BASIC code, it is challenging to add features while
maintaining performance. UV BASIC Profiling provides application performance analysis and insight to
help you quickly determine bottlenecks and performance improvement areas."

Described in Basic User Guide, Chapter 7 pg 75-87.

We recently went to UV11.4 and I haven't had a chance/necessity to try it yet.

Chuck Stevenson's profile image
Chuck Stevenson

Tangential:

I turned on performance monitor (a uvconfig param, xAdmin Performance Monitor,  & *RECORDPERFMONDATA) & run it in the background, saving 15 minute snapshots.

Then I can get a baseline of what's normal & watch for abnormalties.

That's not program-specific.  Looks system-wide for  writes/sec, locks/sec,  that sort of thing.