Wondering if anyone has a solution they are willing to share for reading and writing Parquet formatted files with Analyze.
There are several Python modules available that could potentially be leveraged:
parquet-python (pure python, reader only) : https://github.com/jcrobak/parquet-python
fastparquet (Python 3 only) : https://github.com/dask/fastparquet
PyArrow (Python 3 only) : https://arrow.apache.org/docs/python/
The priority use-case is to publish a partitioned Parquet dataset (which may be multiple physical files) to cloud based storage (S3 or Azure) for efficient ingestion into Hadoop.
Also, I’d also like to know if Infogix has any roadmap plans to include Input and Output connector nodes for Parquet formatted datasets.
Thanks in advance,
Please sign in to leave a comment.