This article and the tools within can also be used to make one Dataflow run another.
PROBLEM BACKGROUND:
Our legacy product, LAE had the ability to create an executable file which could then be used for other things, for example, run from the Linux command line or to trigger an already developed data flow. In Data360 Analyze, there are two ways to do this - see the tool and script attached. They were developed my members of our Professional Services team, Stony Smith & Ernest Jones.
THE METHODS:
1) A single data flow library node, called ExecuteDataflow_Python, that will execute a data flow. When you import the LNA, that library node will be imported. It will look for the data flow by name and then run it. The input fields to the node will be used to populate any run parameters with the same name in the data flow.
2) A python script that can be used if you need to use your own scheduling system instead of the one inside Analyze. It can be used at a Linux and Windows command lines to launch a data flow.
And both of these will create the schedule it needs to run the data flow. These items were purpose built for a specific mode of operation, but customers can adapt them to suit their needs.
If you have more complex rules regarding when a data flow runs than the Analyze scheduler supports, you can schedule a data flow to run periodically that implements these rules and uses the ExecuteDataflow_Python library node to run data flows.
THE LIBRARY NODE:
You'll need to add the node to your library paths and drag it into your data flow. It has one input pin that needs to be connected.
THE PYTHON SCRIPT:
Your scheduler will need to have the ability to run this Python script and pass it all the parameters it needs when it runs the data flow. Other systems like databases, may also be able to do this if they can execute remote procedure calls and are not blocked by a firewall.
It is also worth mentioning that this script only uses components that are already documented in the Help section:
/docs/dist/api/index.html#resource-simple-scheduled-task-run-now
You can utilize this script in any number of ways that you need it to. It invokes a REST call; you can use that as a sample for what REST calls need to be called.
As for executing the script via .exe, you can use this on the command line in Windows:
"C:\Program Files\Lavastorm\LAE614\platform\windows-x86-64\python\python.exe" ExecuteDataFlow.py <parameters>
You could also wrap up that command in a Windows Batch Script file and run that.
And for Linux, the command is:
/home/ec2-user/Lavastorm/LAE/platform/linux-x86-64/python/bin/python <scriptname> <parameters>
API DOCUMENTATION:
Internally, the script and library node use the API that is documented here https://<server>:<port>/docs/dist/api/ and here: https://d3sa-preview.infogixsaas.com/docs/dist/api/
DISCLAIMER:
- These are samples only, you will need to configure them for your specific installation.
- They're still a work in progress.
- They were built for a specific customer, so they may not address all of your needs.
- These were built from the documentation above, so documentation inside of them may be lacking.
Comments
0 comments
Please sign in to leave a comment.