The Create Data node can be configured to output pseudo-random data in several formats.
As a preview to the topics covered in this article, entering the following as your Create Data node's Data property and setting the 'Copies' property to 100,000 will produce the output in the screenshot below:
ID,First Name,Last Name,Email,Emails in Inbox,Address
<<<GUID>>>,<<<NAME.FIRST>>>,<<<NAME.LAST>>>,<<<EMAIL|nulls=0.2>>>,<<<LONG|low=100|high=50000>>>,<<<STREET_ADDRESS_EN>>>
Instead of using static values within the Create Data nodes, variables are available to create random values for each output record.
Header Syntax
The first record of the Data property contains a comma separated list of header names. The number of columns in the header must match the number of columns. A sample header record is below:
ID,First Name,Last Name,Email,Emails in Inbox,Address
All dynamically generated data is created as a string, but it is possible to override and use unicode instead. This is achieved by adding ":unicode" to each column in question:
ID:unicode,First Name,Last Name,Email,Emails in Inbox,Address:unicode
Dynamic Field Syntax
The second record of the Data property contains a comma separated list of field definitions. Each field follows the syntax below:
<<<TYPE|keyword=value|keyword=value|...>>>
TYPE is mandatory and represents the data type to generate. Each keyword=value pair are optional qualifiers to control the type of data generated. Each field is enclosed in <<< and >>>, and each field is comma separated.
An example of a dynamic field variable is below:
<<<DOUBLE|low=100.0|high=500.0|format=%.2f|nulls=.1>>>
This provides a number between 100 and 500 with two decimal places and a 10% chance that the value will be null instead.
Valid types and keywords are in the subsections below.
Type
- AIRPORT_CODE.IATA
- COUNTRY.ISO-3166-2
- COUNTRY.ISO-3166-3
- COUNTRY.TEXT_EN
- CREDIT_CARD_TYPE
- CURRENCY_CODE.ISO-4217
- DATE
- DOUBLE
- ENUM
- GENDER.TEXT_EN
- GUID
- ID
- IPADDRESS.IPV4
- IPADDRESS.IPV6
- LANGUAGE.ISO-639-2
- LANGUAGE.TEXT_EN
- LONG
- MONTH.ABBR_en-US
- MONTH.FULL_en-US
- NAME.FIRST
- NAME.FIRST_LAST
- NAME.LAST
- NAME.LAST_FIRST
- POSTAL_CODE.ZIP5_US
- STATE_PROVINCE.PROVINCE_CA
- STATE_PROVINCE.STATE_PROVINCE_NA
- STATE_PROVINCE.STATE_US
- STREET_ADDRESS_EN
- TELEPHONE
- URI.URL
Keywords
Keywords are optional qualifiers used to modify data creation. Each keyword is available on a subset of types.
format
The Java format pattern to print a number or date.
Field availability: ID, DATE, LONG, DOUBLE
Sample for ID, LONG: format=%08d
Sample for DOUBLE: format=%0.2f
Sample for DATE: format=dd/MM/yyyy
high
Maximum value to use during data generation.
Field availability: ID, DATE, LONG, DOUBLE
Sample for ID, LONG or DOUBLE: low=5000
Sample for DATE: low=2019-01-01 00:00:00
low
Minimum value to use during data generation.
Field availability: ID, DATE, LONG, DOUBLE
Sample for ID, LONG or DOUBLE: low=1000
Sample for DATE: low=2018-01-01 00:00:00
nulls
The nulls keyword is the percentage chance that a data value will be null, represented as a decimal. The chance is calculated at the record level, therefore, a value of 0.2 for 100 output records doesn't guarantee exactly 20 records will be null.
Field availability: All
Valid values: 0.0 though 1.0
Sample: nulls=0.2
values
A carat (^) separated list of fixed values. Values from the list are randomly picked at random.
Field availability: ENUM
Sample: values=Platinum^Gold^Silver
Considerations
Each variable must be used in a 1:1 relation to a field. It isn't possible to concatenate multiple variables into a single field.
Following the header, there can only be a single row of variables. It isn't possible to have alternating outputs by specifying two rows of variables.
Each column is calculated independently. For example, if you have name and email columns, the randomly generated names and the names within email addresses won't match. Likewise, having zip codes and state columns will yield incorrect combinations. The sample data does not validate nor reconcile against itself during creation. The goal is to provide randomly generated pseudo-accurate data for testing purposes.
Additional Example
An additional example of an input and output is below:
ID:unicode,First:unicode,Last:unicode,Email:unicode,Amount Owed:unicode,Last Payment Date:unicode,Client Since:unicode,Status:unicode
<<<ID|low=10000|format=%07d>>>,<<<NAME.FIRST>>>,<<<NAME.LAST>>>,<<<EMAIL>>>,<<<DOUBLE|low=100.0|high=500.0|format=%.2f|nulls=.1>>>,<<<DATE|low=2018-01-01 00:00:00|high=2019-03-31 00:00:00|format=dd/MM/yyyy>>>,<<<DATE|low=2012-01-01 00:00:00|high=2019-03-31 00:00:00|format=dd/MM/yyyy>>>,<<<ENUM|values=Platinum^Gold^Silver|nulls=0.2>>>
The examples in this article are available within the LNA attached. The LNA was exported from v3.5.2.
Comments
0 comments
Please sign in to leave a comment.