When working with time series data it you may need to create set of records containing a date range. This article describes how you can use the Python timedelta function to generate date values that are referenced from another date. The example uses the Generate Data node but you could adapt the code to work in a Transform node to append the date range field to an existing data set.
The Python datetime package provides the timedelta function which is used to create objects that represent the duration between two dates or times. You can use it to create objects of different durations including days, minutes, hours and weeks. A basic example of using it to create a duration of one complete day would be as follows:
The code imports the timedelta function and specifies the output metadata that will hold the date values for today's date and tomorrow's date.
Run Property Substitution is then used to get the value of the built-in RunDate run property (which is a string with a format of YYYY-MM-DD). The string value is parsed using the strptime function to produce the corresponding date type value.
A timedelta object is then created representing a duration of one complete day.
The script in the CreateRecords property outputs the value of the run date and the calculated value for tomorrow's date.
Generating a Date Range
The above code can be modified to output multiple records with an incrementing date value, as follows:
The ConfigureFields script has been changed to specify the number of records to be output by the node. The fixed one day duration timedelta object is no longer required so the corresponding line of code has been removed.
The CreateRecords script uses a for loop to iterate over the code that generates the output record. The node.write() function is used to write an output record for each iteration of the loop.
Parameterizing the Node to generate a Data Range
To go further you can extend the code to create a node that will generate a date range from a specified start date.
Custom node properties have been added to the node to enable the start date, the number of records and the name of the output field to easily configured.
The node obtains the start date for the date range from the 'Start Date' property. If this is not specified, the node defaults to using the current RunDate value. The value for the number of records is obtained from the 'Number of Records' property if it is set and defaults to a value of 1 if not set. Similarly, the name of the output field is determined from the 'Date Field' property and defaults to "DateRange".
As before, the strptime function is used to parse the start date string. After the output metadata has been defined, the CreateRecords script generates the required number of records.
The example data flow is attached below that implements the calendar date range functionality described in this article. To import the data flow you will need Data360 Analyze v.3.6.0 or above. This is because of the use of the RunDate run property. A second version of the example data flow is also attached for use with v.3.4.x instances. This data flow utilizes the CurrentDate run property instead of the RunDate run property. We recommend the use of the 3.6.x version where possible.
Comments
0 comments
Please sign in to leave a comment.