This article aims to provide users who are new to Data360 Analyze but who are familiar with Excel with some guidance on how common tasks in Excel can be performed in Data360 Analyze.
Microsoft Excel is a great tool and many of us rarely go a day without using at least one spreadsheet (unusual day for me if it's just one!), but there are certainly challenges in some scenarios such as:
- managing larger data sets
- joining multiple disparate data sets (especially if they are also large data sets)
- building complex cleansing and analyses but then need to re-run against fresh data
- you want to keep track of the data logic and lineage as you wrangle your data together and explore multiple streams of analysis
- you need to "show your work" to other people (like your peers or auditors and regulators!)
Data360 Analyze provides a way for you to overcome these challenges as a compliment to your work in Excel by enabling you to do more with the knowledge and skills that you already have from using tools like Excel.
|Data Acquisition||In Data360 Analyze|
|I have data in a single Excel worksheet||
Use the Excel File node to read in your Excel file. You will find this under "Input Connectors":
Just specify the filename, which you can do via the File Picker in the Property panel.
|I have data on multiple Excel worksheets||
In the Excel File node, there is an optional property called WorkbookSpec that allows you to specify the worksheets that you want to read in.
You can keep the data as separate data sets or easily read them in together as a single data set.
Recommended Article: How to read multiple worksheets from Excel
|Import multiple Excel files from a folder location||
You can import multiple Excel files at the same time using the Directory List node.
For example, let's say you have an Excel workbook for sales data and there is one for each month - so 12 workbooks in total. Instead of needing to use 12 Excel File nodes to read each file in individually, you can use the Directory List node in conjunction with the Excel File node to easily read in multiple Files at the same time.
Recommended Article: How-can-I-read-in-multiple-files-from-a-directory?
|Import Excel worksheet but only certain rows and columns||
If you only want to read in specific rows or columns in your Excel data, you can use the WorkbookSpec to specify this information.
Recommended Article: Importing-specific-rows-columns-from-Excel
|Exclude blank rows in my data||
There are different ways to exclude rows in your data in Data360 Analyze but if rows are blank and offer no value you can avoid reading them into Data360 Analyze in the first place.
Recommended Article: Can-the-Excel-File-node-handle-blank-rows-in-the-middle-of-the-data?
Preparing Your Data
|Cleansing/ Preparation||In Data360 Analyze|
Recommended Article: Preparing Data – Renaming Fields
|Change Data types||
A quick and easy way to convert the data types in your data set to more appropriate ones (e.g. converting "Amount" column from string to a number) is to use the Modify Fields node (under Aggregation and Transformation). Just connect it to your data, check the Auto detect option and run!
The great part is, this does not change your source data, which means if you end up needing the columns you previously excluded, you can simply change or remove the statement that removed the field's metadata and re-run the node.
|Add a new column||
You can use Calculate Fields node to create a new column in your data.
For example, I want to add a column in my data for the country name because my data includes addresses but not the country.
Alternatively, I can put the following inside a Transform node:
|Filter my data/ Exclude Rows||
There are various ways to filter your data in Data360 Analyze, including code-free filtering within the Data Viewer or using Python Script to reference a row position or specify a condition to control the output.
When you're viewing your data in the Data Viewer , you can filter your data in a manner that is similar to what you may already be used to in Excel:
You are then given the option to specify the Filter:
If you want this Filter logic to be part of your data flow, you can easily add this to your canvas via the menu in the top right of the Data Viewer as shown in the image below:
This adds a Filter node to your data flow with the logic already configured. Note that you can also add a Filter node directly by selecting the Filter in the Node Panel, which you will find under Aggregation and Transformation:
You can also exclude certain rows in your data by referencing either the row position or by using a condition to control the output.
For example, to only include records in my data where the 'Account' field has the value "Active", I can put the following in a Transform node:
To sort your data, you can either:
a) Use the Sort node (Aggregation and Transformation)
b) Use the Data Viewer
In the data viewer, you can sort your data by selecting the Sort option in the menu option for the column that you wish to sort.
You can add this Sort to your data flow via the menu option in the top right of the Data Viewer:
Calculations and Aggregations
|Calculations and Aggregations||In Data360 Analyze|
IF statements in Excel are very powerful and you can use them in Data360 Analyze as well just with a slightly different statement structure.
For example, I have a column 'color' and wanted to create a new column 'category' that would take the value "A" if the color was "Red" and "B" otherwise.
In Excel, my IF statement would look something like
IF (A3="Red", "A","B")
as shown in the image below:
In Data360 Analyze, there are a number of ways this can be implemented. The easiest is to use a Calculate Fields node configured like this:
An alternative is to use a Transform node. The equivalient logic would be structured like this:
out1.category = "A" if in1.color == "Red" else "B"
as shown in the image below:
Note that correct indentation is essential when using Python scripting.
As in Excel, I can have nested IF statements in Data360 Analyze as well.
Following from the example above, in Excel, I might have the following nested IF statement:
In Data360 Analyze, this would look like this:
Python cheat sheet: Python Cheat Sheet
|Calculate Sum, Average, Minimum and Maximum Value||
You can use the Statistics node to perform a quick statistical analysis of numeric input data, including the sum, min, max, average, count, null count, standard deviation and variance functions.
For example, I have sales data from all my distributors:
I can drag-and-drop a Statistics node onto my canvas and connect to my data source. In the 'FieldList' property, I can use the context menu to select the field that I want to run my analysis against, which is the sales amount field 'sale amount ($)'. In 'GroupBy' property, I specify 'distributor' as the field to group on, ensuring that the analysis produces one line per unique distributor.
Note that when the context menu is used to insert the field name it uses the fields['my field name'] syntax.
The Optional properties can be configured to output the required statistics:
The Statistics node then produces the measures including the sum, average, minimum and maximum values of the sale amounts, etc:
Recommended Example: C:\Program Files\Data3SixtyAnalyze\docs\samples\nodes\Aggregating data.lna
To calculate a cumulative sum, you can use the Sum operator an Aggregate node.
For example, I have the total amount in sales by week and I want to view a year-to-date sales total.
In an Aggregate node, the Sum operator can be applied to the 'weekly amount ($)' field using the grid editor. I'm not grouping by any field here so I can simply leave the 'GroupBy' property blank:
As we want to output the cumulative sum for every record processed, the 'ImplicitWriteEvaluation' property is set to 'Per Record'.
This produces the new column 'CumulativeSum', which is the cumulative sum of the weekly amount:
However, the Aggregate node does not, by default, output any of the input fields. To change this requires a small modification to the underlying code used by the node. Switching to the 'Advanced' tab in the 'Operations' property displays the code generated by the grid editor:
Adding this line to the ConfigureFields script instructs the node to output the input fields:
outputs += inputs
(alternatively you could use out1 += in1 instead)
When the node is run this is the output:
Note when the auto-generated code is changed in the Advanced tab the Grid editor can no longer be used. After initial configuration of the node using the grid editor, you may want to make a copy of the node before experimenting with changes to the code.
Recommended Example: C:\Program Files\Data3SixtyAnalyze\docs\samples\nodes\Aggregating Data.lna
Transpose Your Data
|Transformation||In Data360 Analyze|
It's common in Excel to need to use the Transpose option under the Paste menu
In Data360 Analyze, there are three nodes dedicated to helping you transpose your data and you will find these under Metadata and Structure:
For example, I have total sales by month for each distributor on unique rows:
The Pivot - Data to Names node allows me to easily transpose the data such that there is one line per unique Distributor and a column for each month:
The Pivot Table node summarizes tabular data values across two specified fields (dimensions) to create a pivot table containing the summarized (aggregated) data, together with sub-totals for each dimension and the grand total.
The input data can be summarized using a range of aggregation functions, specifically: count, sum, min, max and mean.
Join Your Data
|Joining data||In Data360 Analyze|
You can append multiple data sets together in a single step using the Cat node, which you will find under Aggregation and Transformation.
Just connect your data sources to the Cat node.
Your data does not need to have the same column headers either. if some data sources have additional columns that you want to keep, simply specify the ConcatenationMode property to Union and any data source that didn't have that column will simply have a NULL value assigned.
Recommended Example: C:\Program Files\Data3SixtyAnalyze\docs\samples\nodes\Example Cat Node.lna
To perform a lookup similar to the VLOOKUP in Excel, you can use the Lookup node in Data360 Analyze, which you will find in the Correlation category.
With the Lookup node, if there are multiple fields to be returned for a single join criteria (e.g. Looking up a product code to get the product name, product category and product cost), I can do this in a single Lookup node and not require multiple lookups the way I would need multiple VLOOKUP functions.
There are additional, more comprehensive ways to join data together in Data360 Analyze that go beyond the joining capability in Excel and you will find these nodes in the Correlation node category. For example the Merge node and Fuzzy X-Ref node.
Recommended Article: How do I perform an Excel VLOOKUP in Data360 Analyze?
Output Your Data
|Output||In Data360 Analyze|
Data360 Analyze offers a broad range of Output Connectors such that prepared data can be provisioned in diverse formats and to a range of enterprise data systems including: Flat files, Excel, Databases, Visualization applications (Qlik Sense/Spotfire/Tableau), FTP, Salesforce, Apache Hadoop HDFS/ Spark, MongoDB, Web APIs, Email
|Output to Excel||
If you wanted to output to a pre-formatted Excel report, with charts, logos etc already created, you can use the option in the Output Excel node to output using an existing Excel file as a template.
Recommended Article: Publishing to Excel with a Template