I'm pleased to announce the Data3Sixty Analyze 3.4.1 release is now available.
This Generally Available Long Term Support (LTS) maintenance release provides the following improvements:
- Additional flexibility in the configuration of the Extract ERP Table Node
- The Data Profiler node is now available as an Experimental node.
The release also introduces a number of operational, performance and stability improvements.
Extract ERP Table Node
The node now provides the UniqueKeyFields property that enables you to specify which fields in the DD03L table are to be used to form a unique key for the specified table.
Additional properties are now provided to offer additional flexibility in how the node handles unexpected events when the node requests data for field subsets in batches and must then rejoin the data for each row using keyfields.
The ‘UnexpectedExtractKeysBehavior’ property specifies the action when some data is returned via a request which cannot be matched to the keys extracted in the initial request.
The ‘MissingExtractKeysBehavior’ property specifies the action when there are no records extracted in a given request that match the keys extracted in the initial request.
Data Profiler Node
The new Data Profiler node enables you to profile an input data set to generate metadata and a range of statistics. The node outputs an overview of results on its ‘analysis’ pin and additional information for each input field on its ‘details’ pin.
In this release the Data Profiler node status is Experimental. By default it is not displayed in the node library but can be made visible by setting the display option to show Experimental nodes.
Note: Experimental nodes are not covered by the Infogix Support policy and may be subject to change in a future release.
The results on the node's ‘analysis’ pin provide information on the detected data type (unicode, int, double, boolean, datetime, etc.) of each field. A ‘Type Qualifier’ is populated, where appropriate, to provide semantic detail e.g. indicating a string field contains first names, or the format used to parse a string that contain date/time or datetime values. The ‘Validation’ field provides a regular expression that can be used to validate the data in each of the input fields.
Additional statistics fields are output that indicate:
- Min and max values
- Min and max length
- The number of values that were included in the sample and the number of values that matched the validation
- The number of Null values and the number of blank strings.
Further statistics include:
- The number of values with leading zeros
- Whether strings had leading whitespace, trailing whitespace, or if they strings contained embedded new lines
- The number of distinct values
- The level of confidence for the detected type.
The ‘detail’ output pin provides the profile information in JSON format so that the document can be imported into and used by other applications. If required, the JSON Data node can be used to convert the data into tabular format.
The node can be further configured using the optional properties:
- You can specify the number of records that are to be analyzed during the profiling process rather than all records (the default)
- Empty strings and strings with all whitespace characters can be treated as distinct values or Null (the default)
- Depending on the source of the data and the region where it is being processed, the interpretation of strings containing date values can be ambiguous – i.e. whether it is month first (US format) or day first (European, etc)
- You can specify the format or use the locale for the machine hosting the Analyze application (the default).
When the Default Logical Types are enabled the Type Qualifier information will include semantic types that are applicable to the locale.
For instance if the locale was en_GB then a field containing English first names would have a Type Qualifier of NAME.FIRST.
By default, the regular expressions output in the Validation field include length qualifiers that constrain the length of strings that will match. If required the Length Qualifier property can be set to False, causing the node output regular expressions that will match strings of any length.
You can configure the minimum number of samples required before the type detection process starts. You can also configure the threshold for the percentage of samples that must match before a field is classified as a certain type:
- 100 means all samples must match
- the default is 95.
Properties are also provided to allow you to specify number of discrete values that are tracked and the number of discrete outlier values that will be tracked during the detection process.
By default, the locale of the machine hosting the Analyze application is used during the type detection process.
As an alternative, you can specify the locale to be used by the node. The locale is of the form:
- <language code>[_<country code>[_<variant code>]]
- For example en_US, en_GB, de_AT or th_TH_TH
- Custom semantic types can be used by the node:
Logical types can be specified to recognize common semantic types e.g. email address, zip code, IP address, first/ last names, gender, address, country name / code, month, latitude/ longitude, etc
- The logical type information will be sourced from the specified JSON file
- You can specify a range of attributes for each logical type including: hints for the field name; the regular expression to validate the data; the range of acceptable values and the threshold for the percentage of matching values.
The following issues have been resolved in this release:
LAE-21400, LAE-21443, LAE-21447, LAE-21473, LAE-21490
LAE-21497, LAE-21498, LAE-21499, LAE-21517, LAE-21534
LAE-21537, LAE-21538, LAE-21539, LAE-21540, LAE-21549
LAE-21559, LAE-21563, LAE-21598, LAE-21599, LAE-21602
See the release notes for details of the resolved issues.