This article is about how to create an Analysis to detect duplicate records n DQ+.
- Import data via Data Store
- Use Group By node and group by the 3 key fields
- Within Group By node, add an extra column with a function of COUNT(<FIELDNAME>), where <FIELDNAME> is any of your fields (a key field or any other field)
- Next, fed the results of the Group By node into a Compute node.
- Within the Compute node, add two computation fields. One for checking if a record is duplicate by checking if the occurrence is greater than 1. The other is adding a result code:
- The results of the Compute node will look like:
- Depending on the use case, such as whether you need to know which keys are duplicate versus all of the duplicate records, the grouped data may need to be joined to the original data set. This is performed with a Join node using an Inner Join on the 3 key fields:
- If using the join, the end product is a detection of which fields are duplicate and adding a record-level result code:
Comments
0 comments
Please sign in to leave a comment.