This article aims to provide users who are new to Data360 Analyze but who are familiar with Excel with some guidance on how common tasks in Excel can be performed in Data360 Analyze.
Microsoft Excel is a great tool and many of us rarely go a day without using at least one spreadsheet (unusual day for me if it's just one!), but there are certainly challenges in some scenarios such as:
- managing larger data sets
- joining multiple disparate data sets (especially if they are also large data sets)
- building complex cleansing and analyses but then need to re-run against fresh data
- you want to keep track of the data logic and lineage as you wrangle your data together and explore multiple streams of analysis
- you need to "show your work" to other people (like your peers or auditors and regulators!)
Data360 Analyze provides a way for you to overcome these challenges as a compliment to your work in Excel by enabling you to do more with the knowledge and skills that you already have from using tools like Excel.
Acquiring Data
Data Acquisition | In Data360 Analyze |
I have data in a single Excel worksheet |
Use the Excel File node to read in your Excel file. You will find this under "Input Connectors": Just specify the filename, which you can do via the File Picker in the Property panel.
|
I have data on multiple Excel worksheets |
In the Excel File node, there is an optional property called WorkbookSpec that allows you to specify the worksheets that you want to read in. You can keep the data as separate data sets or easily read them in together as a single data set. Recommended Article: How to read multiple worksheets from Excel |
Import multiple Excel files from a folder location |
You can import multiple Excel files at the same time using the Directory List node. For example, let's say you have an Excel workbook for sales data and there is one for each month - so 12 workbooks in total. Instead of needing to use 12 Excel File nodes to read each file in individually, you can use the Directory List node in conjunction with the Excel File node to easily read in multiple Files at the same time. Recommended Article: How-can-I-read-in-multiple-files-from-a-directory? |
Import Excel worksheet but only certain rows and columns |
If you only want to read in specific rows or columns in your Excel data, you can use the WorkbookSpec to specify this information. Recommended Article: Importing-specific-rows-columns-from-Excel |
Exclude blank rows in my data |
There are different ways to exclude rows in your data in Data360 Analyze but if rows are blank and offer no value you can avoid reading them into Data360 Analyze in the first place. Recommended Article: Can-the-Excel-File-node-handle-blank-rows-in-the-middle-of-the-data? |
Preparing Your Data
Cleansing/ Preparation | In Data360 Analyze |
Rename fields |
In Data360 Analyze, you can easily rename any of your data fields using the Modify Fields node. Alternatively, you can use some code in a Tranform node. For example:
Or you may wish to use the Modify Field Prefix or Change Metadata nodes. Recommended Article: Preparing Data – Renaming Fields |
Change Data types |
A quick and easy way to convert the data types in your data set to more appropriate ones (e.g. converting "Amount" column from string to a number) is to use the Modify Fields node (under Aggregation and Transformation). Just connect it to your data, check the Auto detect option and run!
|
Exclude Columns |
To exclude any columns, you can use the Modify Fields node and uncheck the particular Input Fields. You can also use some code in a Transform node, for example: The great part is, this does not change your source data, which means if you end up needing the columns you previously excluded, you can simply change or remove the statement that removed the field's metadata and re-run the node. |
Add a new column |
You can use Calculate Fields node to create a new column in your data. For example, I want to add a column in my data for the country name because my data includes addresses but not the country. Alternatively, I can put the following inside a Transform node: |
Filter my data/ Exclude Rows |
There are various ways to filter your data in Data360 Analyze, including code-free filtering within the Data Viewer or using Python Script to reference a row position or specify a condition to control the output. Data Viewer When you're viewing your data in the Data Viewer , you can filter your data in a manner that is similar to what you may already be used to in Excel: In the Data Viewer, the menu for each column header provides an option to apply a Filter to that column, as shown in the image below: You are then given the option to specify the Filter: If you want this Filter logic to be part of your data flow, you can easily add this to your canvas via the menu in the top right of the Data Viewer as shown in the image below: This adds a Filter node to your data flow with the logic already configured. Note that you can also add a Filter node directly by selecting the Filter in the Node Panel, which you will find under Aggregation and Transformation: Script You can also exclude certain rows in your data by referencing either the row position or by using a condition to control the output. For example, to only include records in my data where the 'Account' field has the value "Active", I can put the following in a Transform node:
|
Sort Data |
To sort your data, you can either: a) Use the Sort node (Aggregation and Transformation) b) Use the Data Viewer In the data viewer, you can sort your data by selecting the Sort option in the menu option for the column that you wish to sort. You can add this Sort to your data flow via the menu option in the top right of the Data Viewer: |
Calculations and Aggregations
Full details on "calculations and aggregations" may be found within the following article :
Data360 Analyze and Excel - Calculations and Aggregations
Transpose Your Data
Transformation | In Data360 Analyze |
Transpose data |
It's common in Excel to need to use the Transpose option under the Paste menu In Data360 Analyze, there are three nodes dedicated to helping you transpose your data and you will find these under Metadata and Structure:
For example, I have total sales by month for each distributor on unique rows: The Pivot - Data to Names node allows me to easily transpose the data such that there is one line per unique Distributor and a column for each month: Recommended Example: C:\Program Files\Data3SixtyAnalyze\docs\samples\nodes\Example Pivot-Data to Names Node.lna |
Pivot |
The Pivot Table node summarizes tabular data values across two specified fields (dimensions) to create a pivot table containing the summarized (aggregated) data, together with sub-totals for each dimension and the grand total. The input data can be summarized using a range of aggregation functions, specifically: count, sum, min, max and mean. Recommended Example: |
Join Your Data
Joining data | In Data360 Analyze |
Append data |
You can append multiple data sets together in a single step using the Cat node, which you will find under Aggregation and Transformation. Just connect your data sources to the Cat node. Your data does not need to have the same column headers either. if some data sources have additional columns that you want to keep, simply specify the ConcatenationMode property to Union and any data source that didn't have that column will simply have a NULL value assigned. Recommended Example: C:\Program Files\Data3SixtyAnalyze\docs\samples\nodes\Example Cat Node.lna |
VLOOKUP |
To perform a lookup similar to the VLOOKUP in Excel, you can use the Lookup node in Data360 Analyze, which you will find in the Correlation category. With the Lookup node, if there are multiple fields to be returned for a single join criteria (e.g. Looking up a product code to get the product name, product category and product cost), I can do this in a single Lookup node and not require multiple lookups the way I would need multiple VLOOKUP functions. There are additional, more comprehensive ways to join data together in Data360 Analyze that go beyond the joining capability in Excel and you will find these nodes in the Correlation node category. For example the Merge node and Fuzzy X-Ref node. Recommended Article: How do I perform an Excel VLOOKUP in Data360 Analyze?
|
Output Your Data
Output | In Data360 Analyze |
Output results |
Data360 Analyze offers a broad range of Output Connectors such that prepared data can be provisioned in diverse formats and to a range of enterprise data systems including: Flat files, Excel, Databases, Visualization applications (Qlik Sense/Spotfire/Tableau), FTP, Salesforce, Apache Hadoop HDFS/ Spark, MongoDB, Web APIs, Email |
Output to Excel |
You can output data at any step in the data flow to Excel using the Output Excel node or use the Append Excel node to append to an existing Excel file. If you wanted to output to a pre-formatted Excel report, with charts, logos etc already created, you can use the option in the Output Excel node to output using an existing Excel file as a template. Recommended Article: Publishing to Excel with a Template |
Comments
0 comments
Please sign in to leave a comment.