Recently we have received a number of inquires about how to compare the values in two input fields in the same record.
Data360 Analyze enables you to perform this type of comparison in a number of different ways - e.g. by leveraging the functionality of pre-built nodes or by using Python Scripting in a Transform node.
As an example, assume the input data to be compared comprises two date type fields:
Option 1: Calculate Fields node
The Calculate Fields node enables you to quickly create a new field that contains the result of a Python language expression.
In the node's properties panel, the top row of the 'Calculate Fields' property defines a new boolean type field ('DatesMatch') which will contain the result of the Expression:
fields['dueDate'] == fields['alt_dueDate']
This expression uses the fields method of referencing the input fields and is shown here because it works regardless of whether the field name contains space characters. In this particular example the field names don't contain any spaces so you could have used the alternative 'dot' method to reference the fields, e.g.
fields.dueDate == fields.alt_dueDate
In addition, as the field names do not clash with any Python language keywords you could simplify the expression even further by just using the field names in the expression. This is shown in the bottom row in the 'Calculate Fields' property:
dueDate == alt_dueDate
The Calculate Fields node handles the case where an input field contains a Null value, and provides defaults for each data type (see the 'DefaultValueForNull...' properties in the node's Optional properties and the corresponding Help documentation for further details).
Option 2: Python Scripting in a Transform Node
You can utilize Python scripting to compare the values of two fields in the same record, for example:
The metadata for the new output field is defined in the ConfigureFields script. The data type is defined as a boolean
out1.DatesMatch = bool
The comparison of the values in the input records is performed by the ProcessRecords script.
out1 += in1
if fields.dueDate == Null or fields.alt_dueDate == Null:
out1.DatesMatch = Null
else:
if fields.dueDate == fields.alt_dueDate:
out1.DatesMatch = True
else:
out1.DatesMatch =False
After passing through the values of the input fields to the output record the logic performs the comparison. Note that unlike the Calculate Fields node, you have to also handle the situation where one or more input field values are Null (also see the Python Scripting Help topic for information on the 'fn' module's functions that can also help with handling Null values). In this case the output is set to Null if either input field is Null.
Option 3: Reusable Custom Node
Both of the above methods provide solutions for comparing two fields. However, the code needs to be adapted whenever the logic needs to be reused. Another approach is to extend the functionality previously discussed to build a reusable component that provides a range of comparison operations with code-free configuration. Attached below is a custom node, based on a Composite node, that implements these functions.
Similar to built-in nodes, the Compare Fields node allows you to specify the fields to be compared by either entering the name or selecting the input field using the property's context menu. Note, the two input fields must have the same data type.
You can select the required comparison from the options in the 'Operator' property:
In addition to generic operators such as Equal to, Less than, etc. the node supports comparisons a number of comparisons that are only applicable to certain data types - namely, the Before and After operators which are applicable to fields with a data type of date, datetime or time.
You can specify the name for the new field that will contain the result of the comparison (it defaults to 'New_Field'). The data type for the new field is be set by selecting an option from the 'New Field Type' property.
The value to be output if the comparison is true is specified in the 'Value if True' property. Similarly, the output value for when the comparison is not true is specified in the 'Value if False' property. The values specified in the properties must be appropriate for the new field's data type - for example:
- If the data type is Boolean you can set the values to true/false, yes/no or even 1/0
- If the data type is date the values must be specified as YYYY-MM-DD.
- Similarly, for datetime and time data types the values must be YYYY-MM-DD HH:MM:SS or HH:MM:SS, respectively.
By default, all records are output on the node's top output pin. The 'Split Data' property can be used to modify this behavior. Setting the Property to True causes records where the comparison is false to be written to the node's other second pin.
The attached node is a standard Composite node. If required you can convert it to a reusable library node so that it is available in the node library (see the 'converting nodes to a library node' section of the 'Creating Library nodes' Help topic for further information). Data360 Analyze 3.4.0 or above is required to import the data flow.
:
Comments
0 comments
Please sign in to leave a comment.