When working with files created by other systems, you may encounter the situation where the file has been created but it has no contents. In this case you may want your data flow to not process the file and terminate gracefully.
With Data360 Analyze you can check the metadata for a file in a range of ways. The following example describes one code-free approach:
A Directory List node is used to list the files in the target directory that match the pattern:
As the data flow is being run on an Analyze Server instance the DirectoryName property uses the property reference for the 'My Folder' location for files uploaded to the server (similarly {{%ls.lae.shareRootDir%}}/Public would list files in the Public folder). If you are using an Analyze Desktop instance you can use the directory path instead.
The Pattern property is set to match any CSV files. You could use a more specific pattern if required here but filtering is performed by the next node. In this case the output of the Directory List node looks like this:
The Filter node's criteria are configured to select the required file(s). The first criterion selects files where the name contains "Empty". The second criterion checks the size of the file. The Filter node is configured to only output records that match all of the criteria.
As the "Empty.csv" file is, er, empty, no records are output by the Filter node but metadata is present on it's output:
A Meta Check node is configured to validate that at least one record is presented to it's input:
If the check of the minimum record count is successful, nodes connected to it's output will be enabled to run. However, as there are no records at it's input, the CSV/Delimited Input node is prevented from running.
If there had been one or more records at the Meta Check node's input, the records would be propogated through to it's output. The CSV/Delimited Input node is configured to source the name of the file(s) to be imported from the 'FileName' field.
As an alternative approach, if you need or want to use some Python scripting to test the status of a file you can use a Generate Data node:
The node leverages the Python 'os' package - which provides a range of functions for managing files. The ConfigureFields script specifies the name of the file to be read and the output metadata for the node.
The CreateRecords script uses the os.path.isfile() function to check whether the file exists. If it does the node will output a record containing the specified file name, it's size and (just to illustrate the use of a boolean value) a flag to indicate whether the node is empty.
Similar to the previous example, a Filter node is used to check the file name and size and the Meta Check node validates the number of records before enabling the CSV/Delimited Input node to run.
If required, the above scripts could be adapted to run in a Transfrorm node instead of the Generate Data node.
The examples are included in the data flow attached below (requires Data360 Analyze 3.6.0 or above).
Comments
0 comments
Please sign in to leave a comment.