I received a ticket from a client asking why a phantom job keeps blowing up after an indeterminate amount of time. The _PH_ logs shows this:
In /usr/udthome/sys/CTLG/c/someProgram at line 493 Phantom run basic error, exit 3.
Phantom run basic error, exit 3.
Line 493 of
someProgram is literally this:
EXECUTE "
a command"
So it seems like whatever that command happens to be, the TCL level that is running with the EXECUTE is blowing up for some reason. Running out of memory seems like a likely choice, but there may be other reasons as well, of course. The question is, when there's an issue like this, is anything logged anywhere?
Kevin,
What this suggests is that the EXECUTE is blowing up before it has had a chance to emit an error message. I'm presuming the EXECUTE does not have a capturing clause as you haven't stated it does above. If it did it would hide any message.
So in terms of UniData logging anymore messages about it, it's unlikely but have you checked the smm.log, smm.errlog, udt.errlog (sm.log is RFS is on) etc for any other messages. If it was memory related I would have expected to a malloc failure message of some description.
If UniData is not reporting any errors, have you checked the OS system logs for any indications of udt processes aborting. UNIX and Windows have different places to look.
Can I ask if this is on UNIX or Windows and what command is being executed, does the customer have any C routines linked into UniData and what version of UniData is in use.
I received a ticket from a client asking why a phantom job keeps blowing up after an indeterminate amount of time. The _PH_ logs shows this:
In /usr/udthome/sys/CTLG/c/someProgram at line 493 Phantom run basic error, exit 3.
Phantom run basic error, exit 3.
Line 493 of
someProgram is literally this:
EXECUTE "
a command"
So it seems like whatever that command happens to be, the TCL level that is running with the EXECUTE is blowing up for some reason. Running out of memory seems like a likely choice, but there may be other reasons as well, of course. The question is, when there's an issue like this, is anything logged anywhere?
Kevin, if your phantom is constantly running I would add logic to auto stop and restart at least once a day. Something like:
START.DATE = DATE()
LOOP WHILE START.DATE = DATE() DO
...
REPEAT
*
PERFORM "PHANTOM MY.PROGRAM"
STOP
I received a ticket from a client asking why a phantom job keeps blowing up after an indeterminate amount of time. The _PH_ logs shows this:
In /usr/udthome/sys/CTLG/c/someProgram at line 493 Phantom run basic error, exit 3.
Phantom run basic error, exit 3.
Line 493 of
someProgram is literally this:
EXECUTE "
a command"
So it seems like whatever that command happens to be, the TCL level that is running with the EXECUTE is blowing up for some reason. Running out of memory seems like a likely choice, but there may be other reasons as well, of course. The question is, when there's an issue like this, is anything logged anywhere?
Hi Kevin
Just some food for thought. If I wrote two programs like:
NEDEXECUTE "ABORTPROG"
ABORTPROGCRT "IN ABORT"
ABORT
and I run the program:
PHANTOM RUN BP NEDPHANTOM process 10289752 started.
COMO file is '_PH_/nxkesic28396_10289752'.
and i looked at the COMO file:
:!cat _PH_/nxkesic28396_10289752IN ABORT
In BP/_NED at line 1 Phantom run basic error, exit 3.
Phantom run basic error, exit 3.
PHANTOM process 10289752 has completed.
Does the command you are calling have an ABORT statement?
I received a ticket from a client asking why a phantom job keeps blowing up after an indeterminate amount of time. The _PH_ logs shows this:
In /usr/udthome/sys/CTLG/c/someProgram at line 493 Phantom run basic error, exit 3.
Phantom run basic error, exit 3.
Line 493 of
someProgram is literally this:
EXECUTE "
a command"
So it seems like whatever that command happens to be, the TCL level that is running with the EXECUTE is blowing up for some reason. Running out of memory seems like a likely choice, but there may be other reasons as well, of course. The question is, when there's an issue like this, is anything logged anywhere?
Kevin - as a couple of thoughts:
- Check the maximum number of processes per user in the O/S configuration
- UDT-17029: Starting at UniData 8.2.0, if the udtconfig parameters MAX_CAPT_LEVEL and/or MAX_RETN_LEVEL were set higher than a value of 2, any processes performing an EXECUTE command in a subroutine could core dump. This issue has been resolved in 8.2.1.9115 and 8.2.2.
If reproducible on demand, try tracing the failure from the parent process:
- AIX: truss -f -p PID 1>truss_PID.out 2>truss_PID.err
- HPUX: tusc (same syntax)
- Linux: strace (same syntax)
- Windows: Process Monitor (the Microsoft SysInternals Suite GUI tool)
The output from truss goes to stderr and if an O/S error is encountered you will usually see a signature error there
I received a ticket from a client asking why a phantom job keeps blowing up after an indeterminate amount of time. The _PH_ logs shows this:
In /usr/udthome/sys/CTLG/c/someProgram at line 493 Phantom run basic error, exit 3.
Phantom run basic error, exit 3.
Line 493 of
someProgram is literally this:
EXECUTE "
a command"
So it seems like whatever that command happens to be, the TCL level that is running with the EXECUTE is blowing up for some reason. Running out of memory seems like a likely choice, but there may be other reasons as well, of course. The question is, when there's an issue like this, is anything logged anywhere?
@Nik Kesic Hi Nik, I was going to suggest an ABORT statement would also have the same effect as Kevin was seeing hence why I asked what the command being executed was ... if the command being executed was a basic program then that was going to be my next observation. We'll leave the questions (normally around common) on why you'd EXECUTE a basic program for now.
I received a ticket from a client asking why a phantom job keeps blowing up after an indeterminate amount of time. The _PH_ logs shows this:
In /usr/udthome/sys/CTLG/c/someProgram at line 493 Phantom run basic error, exit 3.
Phantom run basic error, exit 3.
Line 493 of
someProgram is literally this:
EXECUTE "
a command"
So it seems like whatever that command happens to be, the TCL level that is running with the EXECUTE is blowing up for some reason. Running out of memory seems like a likely choice, but there may be other reasons as well, of course. The question is, when there's an issue like this, is anything logged anywhere?
Hi Kevin, I run Unidata 821 on Windows. In the past I had an issue running a routine that performed "executes" from a phantom.
I changed from EXECUTE to UDTEXECUTE. We run in an ECLTYPE P environment, but this allowed the executes to run.
I received a ticket from a client asking why a phantom job keeps blowing up after an indeterminate amount of time. The _PH_ logs shows this:
In /usr/udthome/sys/CTLG/c/someProgram at line 493 Phantom run basic error, exit 3.
Phantom run basic error, exit 3.
Line 493 of
someProgram is literally this:
EXECUTE "
a command"
So it seems like whatever that command happens to be, the TCL level that is running with the EXECUTE is blowing up for some reason. Running out of memory seems like a likely choice, but there may be other reasons as well, of course. The question is, when there's an issue like this, is anything logged anywhere?
Jonathon, the EXECUTE does not have a CAPTURING clause. smm.log, smm.errlog, and udt.errlog all show nothing around the time of the abort. You said "if Unidata is not showing errors have you checked the OS logs?". This is a RHEL system, and I'm not sure what logs specifically would reflect this abort at a secondary TCL level. Customer has no C routines linked into Unidata and the version is in the title.
David, I would normally restart phantoms daily but this isn't my code. Just trying to figure out what's changed that is causing this to blow up. That being said, I think the customer may have an issue with a recently introduced bit of home-grown code that opens up temp files but never closes them. As the program that runs simply calls out to other subroutines, it's possible that these temp file buffers (which are different on every open) are eventually eating up memory. We're exploring that option presently.
Nik, Jonathon, no ABORTs here. It's a BASIC routine - written by someone else - but it has no ABORTs.
John J, I'll check the max processes but as this has run for years and only recently began crashing I suspect it's probably set okay. I definitely will check those Unidata configuration parameters. As to being reproduce-able, it varies, but generally given enough time the failure does occur. We would strace the PID of the phantom itself, right? The secondary TCL level has no separate PID, is that correct?
Norrie, I'll have to look up the difference between EXECUTE and UDTEXECUTE.
Thank you all for your feedback. We'll see if this CLOSE issue fixes anything and then do more digging if needed.
I received a ticket from a client asking why a phantom job keeps blowing up after an indeterminate amount of time. The _PH_ logs shows this:
In /usr/udthome/sys/CTLG/c/someProgram at line 493 Phantom run basic error, exit 3.
Phantom run basic error, exit 3.
Line 493 of
someProgram is literally this:
EXECUTE "
a command"
So it seems like whatever that command happens to be, the TCL level that is running with the EXECUTE is blowing up for some reason. Running out of memory seems like a likely choice, but there may be other reasons as well, of course. The question is, when there's an issue like this, is anything logged anywhere?
Hello Kevin,
I assume linux, but similar commands will exist on other UNIX platforms.
You could use 'strace -f -p udt-pid' the udt-pid is the udt process that calls the PHANTOM, but could be hairy if many PHANTOMS are called and you have to sift through the logs.
The is also udtpm which may help and again the udt-pid could be used to isolate to a specific PHANTOM or calling process. Again logs could be huge. Don't use the all category because critical messages may be missing
Also there is the linux /var/log/messages file, but would be something else on other OS flavors.
Also UniData has msglevelconfig which when created in /usr/ud82/include and has an entry, the error will be written to the udt.errlog file. But I think it would basically show what you already see.
I received a ticket from a client asking why a phantom job keeps blowing up after an indeterminate amount of time. The _PH_ logs shows this:
In /usr/udthome/sys/CTLG/c/someProgram at line 493 Phantom run basic error, exit 3.
Phantom run basic error, exit 3.
Line 493 of
someProgram is literally this:
EXECUTE "
a command"
So it seems like whatever that command happens to be, the TCL level that is running with the EXECUTE is blowing up for some reason. Running out of memory seems like a likely choice, but there may be other reasons as well, of course. The question is, when there's an issue like this, is anything logged anywhere?
Kevin,
Does the site have their own C routines?
Were there any recent OS patches to features they may be using?
We have seen a case where issue with the customer's C code produced the '
Phantom run basic error, exit 3' error.
In that case a subsequent patch resolved the issue.
I received a ticket from a client asking why a phantom job keeps blowing up after an indeterminate amount of time. The _PH_ logs shows this:
In /usr/udthome/sys/CTLG/c/someProgram at line 493 Phantom run basic error, exit 3.
Phantom run basic error, exit 3.
Line 493 of
someProgram is literally this:
EXECUTE "
a command"
So it seems like whatever that command happens to be, the TCL level that is running with the EXECUTE is blowing up for some reason. Running out of memory seems like a likely choice, but there may be other reasons as well, of course. The question is, when there's an issue like this, is anything logged anywhere?
Thanks Mike, but there's no external C routines in the mix
as far as I know.