MANUAL TROUBLESHOOTING:
From the server itself, it is difficult to associate a data flow with a PID as the individual nodes are run either in their own process (in the case of the older Agg, Filter, DB Query/Execute, Sort, etc) or possibly in-container for the new (java based) nodes.
You can use commands like egrep, or
ps -ef|grep /node
will tell you each running node and the length of time they've been running for. However, it's difficult to trace that information back to a specific dataflow.
TROUBESHOOTING WITH JQ:
The best way to investigate a run would be to open it in the UI and see what is still running. If that's not clear you should be able to inspect the audit log, using the JQ tool.
Please see this article for setting up JQ on Linux and this one for Windows.
Once it's up and running you'll need to follow these steps....
Firstly, find out all the nodes that started for a specific named Data Flow, replace MyDataFlow
with the name of your Data Flow, this outputs into a file called nodesStarted.log
:
jq 'select(.auditCode=="nodeProcessing" and .arguments.graph=="MyDataFlow") | {graph: .arguments.graph, status: .arguments.runState, node: .arguments.node, locator: .arguments.nodeLocator, starttime: .timestamp }' < lae-audit.log > nodesStarted.log
Next run a similar JQ query which will find all the nodes that completed for the given Data Flow, again replace MyDataFlow
with the name of your Data Flow, this outputs into a file called nodesFinished.log
:
jq 'select(.auditCode=="nodeProcessed" and .arguments.graph=="MyDataFlow") | {graph: .arguments.graph, status: .arguments.runState, node: .arguments.node, locator: .arguments.nodeLocator, starttime: .arguments.startTime, duration: .arguments.elapsedTimeMS }' < lae-audit.log
If there are more entries in the nodesStarted.log
then this suggests that there are still nodes running for the given Data Flow.
You can use Analyze to process these logs - use the JSON Data
node to read them in.
Knowing the start time of any nodes which have yet to complete, might help you decide if the node should still be running or should have completed by now.
Comments
0 comments
Please sign in to leave a comment.