The input data often comes into Analyze in a form that is not ready to be used. Data types define the possible values for a field, and it may be necessary to change them. Data that represents numbers cannot be added if they are configured as strings. Calculations with dates and times will also have the same problem. By assigning the correct data types, we can define operations on the data.
What Data Types Are Available in Analyze?
A data type defines the values for a field or variable, the operations that can be done on that field, and the way the values of that type are stored.
These are the main data types that you can use in Analyze:
- String - Text Values made up of Latin-1 characters
- Unicode - Text Values made up of Unicode characters
- Boolean - True or false values
- Int - Integers or whole numbers in the range -2,147,483,648 to 2,147,483,647
- Long - Also a whole number in the range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 (If in doubt between using Int or Long, use Long. Many will even argue we should never use Int)
- Double (float in Python) - Floating Point or decimal values
- Date - A date value
- Time - A time value
- Date Time - A date and time combined. Also known as timestamp.
Converting Data Types
We can use the Modify Fields node to do type conversions.
1. Add a Modify Field node to the canvas and connect it to a node containing data that needs to have the data type converted
2. In the Properties Panel on the Modify Fields node, under the OutputFields property, you'll see a list of inputs (assuming that the node it's connected to has been run). Here, you can change the type of any of the input fields. In this example, I've changed "type" to unicode, and "rand" to double
The node highlights which inputs have been changed.
3. Check the box named Auto so that all the remaining fields have their types automatically converted
4. Run the node and view the input and output.
Original data types:
Converted data types:
5. Save the dataflow