Phantom process terminating unexpectedly - how to log error?

+4

Jonathan Smith
Rocketeer
Forum|Forum|2 years ago
August 16, 2023

Hello,

My company uses a phantom process that runs using the PHANTOM BRIEF. It's a continuous program that compares the current epoch with the epoch +180 (seconds) to determine when the next iteration will kick off. After 8 or more months of running every 3 minutes, the phantom unexpectedly self-terminates. It's a business-critical process that needs to be running at all times. It stopped unexpectedly over the weekend and resulted in a lot of issues. I've been scouring the forums for a few hours to see if I can figure out some way to prevent it from happening, but I can't identify the root cause.

The program has had read/write logging set up for several years. Any error opening files or writing to them is logged to a custom flat file on a network drive, but no logs have been written there the past 3 times the phantom terminated. No logs were written to PH--though I'm not sure if that needs manual setup. The errlog file only displays the current day's user errors, preventing me from seeing the error on Friday.

How would I log the actual termination of the phantom process, and possibly the stack trace to determine which data it was accessing when it terminated? If possible, I'd like to write logs to the network drive to mitigate risk when dealing directly with the Linux production server. And how would I detect that the process terminated and then automatically restart it? We're on UV 11.2 on RHEL 7.9 and use U2 BDK 4.3 for writing programs. TIA

Josh,

In terms of detecting if the process has terminated, one approach would be, toget the phantom process to write it's pid on startup to a control file somewhere.

You could then have a crontab job running in Linux every 15 mins or whatever time interval you decide to check the status of the process from the pid stored in the control file even going into UniVerse to do so if you wish.

For Example if you know the pid you can use PORT.STATUS to see if the process is still active, or you could use the ps statement to check the pid (PORT.STATUS needs to be available in the account you want to run it), this is why ps may be a better alternative (as long the pid hasn't been recylced before the crontab checks it). If you have never used crontab then you can check the man pages for it, it's pretty simple to use.

>PORT.STATUS PID 17629236

There are currently 1 uniVerse sessions; 0 interactive, 1 phantom

Pid.... User name. Who. Last command processed............................
17629236 root -2 RUN BP TEST23 [ TEST23 @ 0x0 ]

If PORT.STATUS does not return what you expect then use the crontab process to respawn the phantom.

In terms of getting the stack trace, unless the process produces a core file this is not something that you are going to do without sending the correct signal to the process to force it to generate a core file and stop.

Regards,

Like

+2

Brian Paige
Participating Frequently
Forum|Forum|2 years ago
August 16, 2023

Hello,

My company uses a phantom process that runs using the PHANTOM BRIEF. It's a continuous program that compares the current epoch with the epoch +180 (seconds) to determine when the next iteration will kick off. After 8 or more months of running every 3 minutes, the phantom unexpectedly self-terminates. It's a business-critical process that needs to be running at all times. It stopped unexpectedly over the weekend and resulted in a lot of issues. I've been scouring the forums for a few hours to see if I can figure out some way to prevent it from happening, but I can't identify the root cause.

The program has had read/write logging set up for several years. Any error opening files or writing to them is logged to a custom flat file on a network drive, but no logs have been written there the past 3 times the phantom terminated. No logs were written to PH--though I'm not sure if that needs manual setup. The errlog file only displays the current day's user errors, preventing me from seeing the error on Friday.

How would I log the actual termination of the phantom process, and possibly the stack trace to determine which data it was accessing when it terminated? If possible, I'd like to write logs to the network drive to mitigate risk when dealing directly with the Linux production server. And how would I detect that the process terminated and then automatically restart it? We're on UV 11.2 on RHEL 7.9 and use U2 BDK 4.3 for writing programs. TIA

Josh,

We had a similar issue years ago when we wrote what we call our phantom scheduler. After weeks/months of running, it would silently abort. We decided on a workaround where we have the process self-terminate just before midnight every day (conditional exit in the program), then a cron job restarts it just after midnight. Once it was never running for more than 24 hours, it stopped aborting.

We figured the root cause was probably some kind of memory leak or workspace resource limitation that it would hit after building up over time, but after the workaround was implemented, we didn't dig into it any further.

I'm sure someone has a better answer for you, but this worked for us.

Like

+1

John Jenkins
Participating Frequently
Forum|Forum|2 years ago
August 16, 2023

Hello,

My company uses a phantom process that runs using the PHANTOM BRIEF. It's a continuous program that compares the current epoch with the epoch +180 (seconds) to determine when the next iteration will kick off. After 8 or more months of running every 3 minutes, the phantom unexpectedly self-terminates. It's a business-critical process that needs to be running at all times. It stopped unexpectedly over the weekend and resulted in a lot of issues. I've been scouring the forums for a few hours to see if I can figure out some way to prevent it from happening, but I can't identify the root cause.

The program has had read/write logging set up for several years. Any error opening files or writing to them is logged to a custom flat file on a network drive, but no logs have been written there the past 3 times the phantom terminated. No logs were written to PH--though I'm not sure if that needs manual setup. The errlog file only displays the current day's user errors, preventing me from seeing the error on Friday.

How would I log the actual termination of the phantom process, and possibly the stack trace to determine which data it was accessing when it terminated? If possible, I'd like to write logs to the network drive to mitigate risk when dealing directly with the Linux production server. And how would I detect that the process terminated and then automatically restart it? We're on UV 11.2 on RHEL 7.9 and use U2 BDK 4.3 for writing programs. TIA

In terms of checking a process health, Jonathan's method of tracking a PID would work, though watch that the same PID does not get re-used by another process in between checks. An alternative methodology is to use the BASIC statement LOCK n. For example:

Choose a LOCK for the process - I will choose lock 1 one) for this example.

PHANTOM is spawned from crontab hell script at regular intervals (e.g. hourly) and once running it tries to gain lock one - viz: LOCK 1

If it fails then lock one is in use by an instance of the PHANTOM already running, so just exit gracefully, optionally appending a log record to a sequential log file.
If it succeeds then it is the sole instance of the PHANTOM that is running and again, optionally append a log record to a sequential log file..
As long as the PHANTOM continues to run then lock one remains set,
If the PHANTOM exits then lock one is released, and the next attempt to fire up a PHANTOM using crontab will succeed.

NOTE: You might want to build some checks into the cron script you use so you can choose to launch - or not to launch - using an O.S level flag file. If the file exists then the PHANTOM runs. if the file does not exist then the PHANTOM does not run. Again, optionally appending a log record to a sequential log file.

As a suggestion for trying to trap the type of failure, start multiple desktops on the system console (note - NOT on a network-connected terminal which can get disconnected, causing a process to to terminate). For diagnostic purposes, run the process as a foreground process from a shell script on one of the desktops which you will dedicated for this purpose. Switch over to another desktop for routine administration and housekeeping work. If/When the process fails you should see at least some indication on the backpages of the dedicated desktop, though make sure you have a fair number of backpages available.

Additional: You could also monitor the process at intervals checking the size of the process and the number of file handles in use (as examples of potential metrics) to see if there is any sign of a leak.

Hopefully this helps.

Regards,

JJ

Like

Mark Baldridge
Participating Frequently
Forum|Forum|2 years ago
August 16, 2023

Hello,

My company uses a phantom process that runs using the PHANTOM BRIEF. It's a continuous program that compares the current epoch with the epoch +180 (seconds) to determine when the next iteration will kick off. After 8 or more months of running every 3 minutes, the phantom unexpectedly self-terminates. It's a business-critical process that needs to be running at all times. It stopped unexpectedly over the weekend and resulted in a lot of issues. I've been scouring the forums for a few hours to see if I can figure out some way to prevent it from happening, but I can't identify the root cause.

The program has had read/write logging set up for several years. Any error opening files or writing to them is logged to a custom flat file on a network drive, but no logs have been written there the past 3 times the phantom terminated. No logs were written to PH--though I'm not sure if that needs manual setup. The errlog file only displays the current day's user errors, preventing me from seeing the error on Friday.

How would I log the actual termination of the phantom process, and possibly the stack trace to determine which data it was accessing when it terminated? If possible, I'd like to write logs to the network drive to mitigate risk when dealing directly with the Linux production server. And how would I detect that the process terminated and then automatically restart it? We're on UV 11.2 on RHEL 7.9 and use U2 BDK 4.3 for writing programs. TIA

An different version of the "LOCK n" approach is to establish a "well known" record ID, and have the phantom issue a READU lock on this record. Should the phantom terminate, UniVerse will release the READU lock automatically.

After initiating the phantom, launch another program that gets a current time stamp and issues a READU lock on the same record. This will eventually time out, so this must be handled appropriately (THEN/ELSE clause). One option would be to grab a time stamp, reissue the READU lock request and see if the next successful lock is nearly the same time stamp as the previous one, or the timeout value of 20 minutes. Or, more simply, when the monitor program's READU returns, the RECORDLOCKED statement can simply determine the presence of the lock or its absence.

If the program recognizes a terminated phantom, it can then capture the errlog file, as well as relaunch the phantom with little downtime. Also, the monitor program can issue some notification.

If the phantom has some way of determining a time cycle, the record with the READU lock can be updated with a last posting time, and next expected posting time using a WRITEU. All it has to do is wake up within the next posting time and post a new entry. If the phantom waits on a socket, named pipe, or presence of some request in a file, a NOOP request can be dropped into the queue at an interval shorter than the "expected" wakeup time. When I have done this, the this/next times are in both internal and external time values. This simplifies yet another program that displays the two times with no time differences, say UV time and Java time, just the strings can be displayed. When the READU lock vanishes, the background color can change, and notification can occur.

Like

Mark Copp
Participating Frequently
Forum|Forum|2 years ago
August 18, 2023

Hello,

My company uses a phantom process that runs using the PHANTOM BRIEF. It's a continuous program that compares the current epoch with the epoch +180 (seconds) to determine when the next iteration will kick off. After 8 or more months of running every 3 minutes, the phantom unexpectedly self-terminates. It's a business-critical process that needs to be running at all times. It stopped unexpectedly over the weekend and resulted in a lot of issues. I've been scouring the forums for a few hours to see if I can figure out some way to prevent it from happening, but I can't identify the root cause.

The program has had read/write logging set up for several years. Any error opening files or writing to them is logged to a custom flat file on a network drive, but no logs have been written there the past 3 times the phantom terminated. No logs were written to PH--though I'm not sure if that needs manual setup. The errlog file only displays the current day's user errors, preventing me from seeing the error on Friday.

How would I log the actual termination of the phantom process, and possibly the stack trace to determine which data it was accessing when it terminated? If possible, I'd like to write logs to the network drive to mitigate risk when dealing directly with the Linux production server. And how would I detect that the process terminated and then automatically restart it? We're on UV 11.2 on RHEL 7.9 and use U2 BDK 4.3 for writing programs. TIA

There’s a few things I’d personally try

remove the BRIEF from the starting of the PHANTOM process and check in the &PH& file following a failure. I’d also setup a CRON task to purge the files using ‘find’ and the -mtime argument to prevent a large build up of files within &PH&, just be careful what you’re purging
Check the directory that the process runs in looking for core files, if you find any then you can examine the backtrace using gdb or examine the Universe stack although a bit more challenging can be done with a gdb function
Depending on how many system calls the process makes start the process with ‘strace’ maybe using the -trace= argument and hone in on specific system() calls, I’d probably start with write() looking for error messages being sent to stdin or stderr channels

best of luck

Mark

Like

J

Jon Card
Participating Frequently
Forum|Forum|2 years ago
August 21, 2023

Hello,

My company uses a phantom process that runs using the PHANTOM BRIEF. It's a continuous program that compares the current epoch with the epoch +180 (seconds) to determine when the next iteration will kick off. After 8 or more months of running every 3 minutes, the phantom unexpectedly self-terminates. It's a business-critical process that needs to be running at all times. It stopped unexpectedly over the weekend and resulted in a lot of issues. I've been scouring the forums for a few hours to see if I can figure out some way to prevent it from happening, but I can't identify the root cause.

The program has had read/write logging set up for several years. Any error opening files or writing to them is logged to a custom flat file on a network drive, but no logs have been written there the past 3 times the phantom terminated. No logs were written to PH--though I'm not sure if that needs manual setup. The errlog file only displays the current day's user errors, preventing me from seeing the error on Friday.

How would I log the actual termination of the phantom process, and possibly the stack trace to determine which data it was accessing when it terminated? If possible, I'd like to write logs to the network drive to mitigate risk when dealing directly with the Linux production server. And how would I detect that the process terminated and then automatically restart it? We're on UV 11.2 on RHEL 7.9 and use U2 BDK 4.3 for writing programs. TIA

To expand on Mark's second point. I have created a small VOC script

Create three dictionary items ID, Date, Time on &PH&

1: A
2: 0
3: DATE
4:
5:
6:
7: D2/
8: G2_1
9: R
10: 8

1: A
2: 0
3: ID
4:
5:
6:
7:
8: G0_1
9: L
10: 8

1: A
2: 0
3: TIME
4:
5:
6:
7: MT
8: G1_1
9: R
10: 8

VOC:

PA
SELECT &PH& BY.DSND ID BY.DSND DATE BY.DSND TIME
ED &PH&

Every time you start the phantoms a new &PH& file is created. When you run the VOC the newest phantom is on top. I just use the ED (Editor) to view the file(s). If there is more than one copy they are the old phantoms that can be deleted if not used. Just remember if the phantom is running you can crash the phantom if you are in the log file when it tries to write.