Analyze connect to AWS services (other than S3)
Analyze provides the service-specific nodes:
- S3 Delete
- S3 Get
- S3 List
- S3 Put
Are there plans to cover more services (e.g. we are seeking a way to read from the AWS Glue Catalog API) or perhaps for a general approach that isn't service-specific, e.g. leveraging AWS CLI tools.
I had hoped to make headway with AWS SDK for Python but this depends on boto3 which, while a "pure Python" module, is not part of the Jython standard library or shipped as part of Analyze; I understand this to be a pre-requisite (see Adrian's informative response here).
We are currently using Analyze 3.6.4 and understand some AWS tools are covered implicitly e.g. Redshift offers JDBC/ODBC via JDBC/DB nodes.
-
Regarding the generic support for third-party Python packages in the Jython-based nodes:
As of Data360 Analyze v.3.6.4 the Analyze installer now creates a folder ('<site>/lib/jython2') within the installation’s ‘site’ directory that can be used to store 3rd party python packages for use with Jython-based nodes e.g. the Transform node and Generate Data node
The new jython2 directory is automatically added to the Jython package search path so any packages installed in this directory will be available to the node.
You can install the boto3 package (and it's dependent packages) into the jython2 directory using the attached data flow.
I am not aware of any plans to expand access to other AWS services at this time.
Attached files
Install_boto3_Package_Using_pip_in_Transform_Node--share - 16 Oct 2020.lna
-
Thanks, super helpful. I was able to install boto3 but am seeing an error when trying to use it to achieve this issue's aim (connection to S3):
Error code:brain.nodes.script.jython.scriptExecErrorjava.util.concurrent.RejectedExecutionException: java.util.concurrent.RejectedExecutionException: event executor terminated Error executing "ConfigureFields" at line "4"
Attached files
node-logs-2020-10-16-13-50-11.010-0.zip
-
I have not used the package myself but the documentation indicates the AWS credentials must be set up before using boto3. Could this be the problem?
Some potentially relevant links:
https://boto3.amazonaws.com/v1/documentation/api/1.9.42/guide/configuration.html
https://stackoverflow.com/questions/24392700/boto-credential-error-with-python-on-windows
-
On second thoughts it may be an issue with support for the multiprocessing package when using Jython.When the logging level on the node is increased to debug the log shows this warning
However, when the multiprocessing package is installed in Jython, the following error is generated when attempting to load the module:
It may be that you need to investigate using the Python node (which uses cPython) rather than the Transform node and Jython.
-
Here are the errors I get when trying to set credentials as advised by https://stackoverflow.com/questions/45981950/how-to-specify-credentials-when-connecting-to-boto3-s3#comment78923893_45982075
I will try the Python node as you suggest, and feed back with results.
Attached files
Screenshot_10.png
Screenshot_9.png -
Python node complains of missing boto3 library because this was installed to Jython rather than Python; how do I go about installing to Python to overcome this error?
Attached files
-
Still seeing the following error after trying a few things:
AttributeError: 'module' object has no attribute '_Condition'
Things I tried:
- running code in cPython as advised (see above; I'm stuck at installing boto3 to appropriate location).
- running code in "ProcessRecords" field instead of "ConfigureFields" field.
- wrapping code in a function as demonstrated by https://stackoverflow.com/a/1250135/8012789
- using different variable names as demonstrated by https://stackoverflow.com/a/3124164/8012789
-
Please advise whether I should raise a separate support ticket to handle the specific question of "installing boto3 to appropriate location" so that this & other Pure Python libs can be made available to the Python node. I have checked the boto3 approach outlined in docs; it works as expected from within all sorts of Python instances, but not from within Analyze given the approaches tried above (or any other Jython instances).
-
I'm not aware of any customers that have attempted to use the boto3 package with Analyze.
Re. the error in your screenshot_11:
The Analyze product includes separate installations of Jython and cPython.
For a Windows system the cPython installation is located at <AnalyzeInstallDir>\platform\windows-x86-64\python
(and for a Linux system the cPython installation is here <AnalyzeInstallDir>/platform/linux-x86-64/python)
For a Windows system you can open a cmd prompt in *administrative mode* and cd to the directory containing the Python executable. From here you can use pip to install the package, e.g. using:
python -m pip install boto3
However, this installs the package into the default site-packages directory at <AnalyzeInstallDir>\platform\windows-x86-64\python\Lib\site-packages. This directory is not maintained when the Analyze application software is installed, so it is better to install the packages to a different location. The recommended approach would be to install the package to a directory under your 'site' directory, e.g. <site>/Python/site-packages. You can modify the installation command to use the target option
python -m pip install boto3 --target /path/to/my/local/repo
To use the installed package you need to ensure the directory is on the search paths used by python when it is looking for modules. You can add the following to the start of your script in the Python node:
import sys
import os
sys.path.append(os.path.abspath("/path/to/my/local/repo"))Then import it as usual:
import boto3
Please sign in to leave a comment.
Comments
10 comments