The in-built Data360 Analyze scheduler enables you to automate the execution of a data flow with a selection of different periodicities - without the need to use coding. However, in some situations you may want to have additional control over when a data flow is scheduled to run.
Quarterly Report
The in-built scheduler UI allows you to run a data flow on the first, second, third, fourth or last day of the month, but you may need to run a report on the last day of the quarter. In this case the you can include some logic in your data flow that validates the current date and then decides whether to run the 'main' logic in your data flow.
A Generate Data node is used to host most of the validation logic. The ConfigureFields script imports the python calendar module and defines the node's output metadata.
The CreateRecords script obtains the values for the current day, month and year. It then leverages the python calendar module's monthrange() function to determine the value for the last day in the current month. The modulo function ( % ) is then used to determine whether the current month is the last month in the quarter. If the current date is the last day of the last month in the quarter then the node outputs the 'run_DF' field with a value of True.
A Transform node is then used to check the value of the run_DF field. The node only outputs a record if the value is True.
A Meta Check node is then used to check the number of records at it's input. If there is at least one record the 'SuccessAction' is performed - which is used to enable any nodes that have a Run Dependency connected to the node's 'Clock' output. If there are no records at it's input then the nodes connected to it's 'Clock' output are not permitted to run.
The data flow is scheduled to run daily at the required time of day. As the first node(s) in the main logic of the data flow are connected to the 'Clock' output they will only be enabled to run on the last day of each quarter.
See the attached example data flow: 'End_of_Quarter_Report_368--share - 7 Apr 2021.lna'
Running Data Flow on Specific Days of the Month
You may need to run a data flow on the same days of each month. Rather than creating separate schedules to run the data flow on each day you could use a 'driver' data flow that is scheduled to run daily and configure the driver data flow to only run the 'main' data flow on the specific days of the month. The details of which days to run the main data flow on, together with the identity of the main data flow could be defined in Run properties:
The values for these Run properties would be specified in the schedule for the driver data flow that is run daily at the appropriate time. The 'Days_To_Run_List' Run property contains a comma-separated list of the required days, e.g. 2,9,30 and the 'Main_DF' Run property contains the 'Resource Path' of the main data flow.
To get the Resource Path, navigate to the Analyze directory and select the required data flow. The Resource Path can then be copied from the properties panel to the clipboard by clicking on the Copy icon:
The example driver data flow is shown below.
The Generate Data node's ConfigureFields script specifies the output metadata which comprises fields to contain the values of the two Run properties and a boolean field that indicates whether the main data flow is to be run:
The CreateRecords script retrieves the value of the 'Days_To_Run_List' Run property and splits the (comma-separated) string value into a list. If the Run property is not defined, the node aborts further processing and generates an error. Similarly, the Resource Path for the main data flow is retrieved (or an error generated if the property is not defined).
The values of elements in the 'run_days_list' are converted to integers using the map() function. The node then retrieves the current date and extracts the day component's value. The 'run_dats_list' is then checked to see whether it contains the current_day value and if it does, the 'run_DF' variable is set to True (or False if the current_day is not in the list). The results are then output.
A Transform node is then configured to check the value of the 'run_DF' field and only output a record if the value is True:
A Check Metadaa node is configured to check the number of input records. If there is at least one input record the 'Success' action is performed - which enables any nodes connected by a Run Dependency to the node's output Clock. If there are no input records, the 'TerminusAction' is performed which prevents any connected nodes from running.
An Execute Data Flow node is configured to run the main data flow specified in the 'Main_DF' Run property. In this example no additional run properties were used but you could configure the node to reference a Run Property Set if required.
The example driver data flow 'Data_Flow_Runner_368--share - 7 Apr 2021.lna' is attached below.
Comments
0 comments
Please sign in to leave a comment.