A common use case for Data360 Analyze is to process a file that is received daily. This article describes one way that this task can be achieved.
Scenario
Once a day an external system stores a file in a specified directory on the file system of the machine hosting Analyze. A data flow needs to be created to pick up the file, process the contents of the file and then rename the file, leaving the renamed file in the same directory.
Example data flow
A Directory list node is used to list files in the specified directory that match a pattern.
In this case the source files have filenames with a format of "Test_Data_YYYYMMDD.csv" where YYYY is the year, MM is the month and DD day of the month.
A Filter node is then used to select the file whose name contains the date matching the day on which the data flow is run:
The value to be matched in the filter uses Run Property Substitution to make the filter dynamic. In this case the RunDate_PathSafe run property is being used. The Filter node outputs the file for today (or zero records if there is no match).
A Check Metadata node is used to confirm there is a file to be processed. If there is one record the CSV/Delimited node is enabled to run using a run dependency.
The contents of the file are read using a CSV/Delimited node.
The node is configured to source it's file name from the input 'FileName' field. The data can then be processed, as required. A Run Dependency is used to delay the running of the node which will rename the daily data file. When all of the records in the file have been processed, the run dependency allows the Transform node to run.
The node's ConfigureFields Script is configured to import the 'os' package, to pass through the input fields to the output and also defines the metadata for a new field which will hold the new name for the renamed file.
The ProcessRecords script gets the filename of the processed file. It then checks if the file exists and if so, it splits the filename to derive the directory path (which is stored in the 'head' variable) and the file name component. A new file name is then constructed which comprises "Processed_" plus the original filename. The file is then renamed and the node outputs the input fields and the name of the new file.
As the processed file has been renamed, it would not be matched by the Directory List node should the data flow be subsequently re-run in the same day.
The data flow could be scheduled to regularly check for the presence of the daily file and process it once it becomes available.
The example data flow is attached to this article. Note that Data360 Analyze version 3.5.x1or above is required because this was the first release the RunDate_PathSafe run property was available in the product.
Comments
0 comments
Please sign in to leave a comment.