This article covers how to read data from Excel when you have data on multiple worksheets.
Scenario 1) I have data on multiple Excel worksheets that I want to use as a single data set
Example: Sales data for Q1 and Q2 are on two worksheets but I want to be able to merge the data to perform analyses across the two sales quarters.
- Use the Excel File node and set the File property to the Excel workbook of interest.
- In the Property panel, use the WorkbookSpec property to specify the sheets that you want to include either using the sheet name or the sheet index number.
For example, to read in worksheets named "Sheet1" and "Sheet2" your WorkbookSpec property may look like the following:
Notice that both lines 2 and 3 include outputIndex="1". The outputIndex tag specifies the output pin in the Excel File node that the data will be output to, which in this case is the first one and the one Excel File node defaults with having.
What if the column names on your worksheets don't match exactly?
If the column names do not match exactly, you can change the ConcatenationType Property - for example, setting this property to Union will bring in ALL the fields on all the worksheets that you have listed and populate a NULL value for columns where there is no corresponding value.
Scenario 2) I have data on multiple Excel worksheets that I want to keep as separate data sets
Example: Sales data for Q1 and Q2 are on two worksheets but I want to be able to merge the data to perform analyses across the two sales quarters BUT I also have Customer data on a third worksheet that I want to use to enrich my sales data.
- Use the Excel File node and set the File property to the Excel workbook of interest.
- In the Property panel, use the WorkbookSpec property to specify the sheets that you want to include either using the sheet name or the sheet index number. Include an outputIndex number, ensuring that the datasets that you want to be read in together have the same outputIndex number and the data that should be read in separately have a different outputIndex number.
For example, to read in worksheets named "sales qtr 1" and "sales qtr2" together but reading in worksheet "customer list" separately, your WorkbookSpec property may look like the following:
Notice that lines 2 and 3 have outputIndex="1" but line 4 has outputIndex="2". This means that the sales qtr data will be read in together to the first output of the Excel File node and the customer data will be output to the 2nd output of the Excel File node. - Now we need to make sure that the Excel Node has a matching number of output pins as we have referenced in our WorkbookSpec. The Excel node defaults to having a single output pin.
To add more (in this case, we need one more so that we have two in total), navigate to the Define tab of the Properties panel. Under Outputs, type on a new line to add a new output - I've called it "out2" here but you can give it a more interesting name!
Now your Excel File node will show an additional output pin:
Further resources:
- Help documentation on the Excel File node: Node help > Input Connectors > Excel File
Comments
0 comments
Please sign in to leave a comment.