By leveraging the LogicalTypeDefinitions property of the Data Profiler node, you can add define custom data types to detect using regex. The example below will add detection for CPF Ids.
The LogicalTypeDefinitions property uses JSON file to add new regex detection and the information to return if the Data Profiler detects values using the same regex.
LogicalTypeDefinitions's JSON file syntax
The JSON below is an example of a simple detection of CPF Ids. The file will need to be uploaded to the Analyze server and referenced within the LogicalTypeDefinitions property. Each field is defined within this section.
[
{
"qualifier": "CPF_id_full",
"regExpsToMatch": [ "\\d{3}\\.\\d{3}\\.\\d{3}-\\d{2}", "\\d{3} \\d{3} \\d{3} \\d{2}" ],
"regExpReturned": "\\d{3}\\.\\d{3}\\.\\d{3}-\\d{2}",
"threshold": 100,
"baseType": "STRING"
},
{
"qualifier": "CPF_id_digits_only",
"regExpsToMatch": [ "\\d{11}" ],
"regExpReturned": "\\d{11}",
"threshold": 100,
"baseType": "LONG"
}
]
Fields
- qualifier - the name of the data type to return
- regExpsToMatch - an array of regex to check data against.
- regExpReturned - the regex to display in the results' Validation column. This can be any value, but is generally the expected regex format
- threshold - the threshold of what percent of records should match the values of regExpsToMatch
- baseType - the type of the value to check against.
In both regExpsToMatch and regExpReturned, double slashes are required where a single slash would be needed. For example, \\d is needed to match a digit, rather than the standard \d regex syntax.
Note that the baseType must match the type that the Data Profiler node would categorize the data. This is why the example above splits the \\d{11} regex check as a second qualifier flagged as a LONG versus the other checks flagged as a STRING.
A sample output of the Data Profiler node using the JSON above is below. A sample LNA, exported from 3.5.2, is also attached below.
Comments
0 comments
Please sign in to leave a comment.