How to conditionally allow one data set to continue through workflow
I have a scenario where my data passes through a [Transform A] node with 2 output pins and the pin the data comes through is dependent on the value of an inputted run-time property. At any given time, only one pin would be populated with records, while the other would have zero records. Each pin is connected to a separate Transform node [Transform B & Transform C] that's responsible for outputting a unique set of columns. From this point onward, all the logic is the same, ie. the node between [Transform B & Transform C] which has records needs to be connected to some [Node D]. The piece I'm trying to figure out is, how do I stitch the outputs from [Transform B & Transform C] together into a singular output so I can connect it to [Node D] in such a way that the stitched output only contains the columns from the Transform Node between [Transform B & Transform C] which actually has records?
-
Attached is a data flow that provides an example of how you can achieve the above. Unfortunately, it is not a trivial task.
The functionality could probably be implemented using a custom Java node but in this case, multiple nodes are used.
The CreateData node and Transform - A node are equivalent to block 'A' in your diagram. In this example a run property "OutputFirst" is used to simulate the conditional execution that outputs records to one of the two outputs of the Transform - A node. The run property can be set to true or false to switch the records as required.
The two Modify Fields node represent the structural changes performed by blocks 'B' and 'C' (in this case a different subset of the fields are output by each of the nodes). Note that field metadata is output by both nodes, regardless of whether there are any data records output.
Now we get to your logic in the green block...
A Cat node is then used to combine the two data sets using a union of the fields. This allows the downstream logic to be fed with a single data set regardless of whether the data was routed to 'B' or 'C'. However the result of this is that all output records have the same set of metadata (where some fields have all Null values in the records).
A Transform node is then used to scan the field values in each of the records to determine which fields have all Null values.
The information on which fields are all Null is then combined with the data using another Cat node. The information only appears in the first record of the combined data set (see the field called '_AllNullFields').
A Transform node is then leverages the Python csv module to write data to a tab-delimited temp file. There is a custom property ('TempFile') defined on the node to allow you to set the file name. Note, if multiple data flows are to use this logic the temp file name must be unique. The Transform node uses the _AllNullFields information to determine which fields from the input data need to be written to the temp file (i.e. omitting any fields that were all Null). The Transform node outputs the filename of the temp file.
A CSV/Delimited node is then used to read in the modified data set. As field type information is not preserved when the data is written to the temp file, a Modify Fields node is then used to detect the appropriate data type. The Modify Fields node also removes the field that contains the temp filename.
A Transform node cleans up the temp file once the data has been read in.
Note: To provide compatibility with the widest set of Analyze supported software versions the example was created using v.3.4.2 and can be imported into systems running v.3.4.x and v.3.5.x. However, the consequence of this is that the case of the data field names are not maintained (because of a limitation in the fields.todict() function used when selectively writing the data to the temp file) - all field names are lower-cased as a result. This limitation has been resolved in subsequent release. A separate version of the data flow will be provided that allows the case to be maintained and can be used with v.3.5.1 and above.
Attached files
Please sign in to leave a comment.
Comments
3 comments