Scenario :
Company "A" runs a data quality check ( within an analysis ), writing the rows that passed to the data store. However, the data within the data store is not visible when pointing to it within another analysis or navigating the "view sample data" feature.
When the data store is pushed to a data view and in turn created into a dashboard, the data is visible.
This leads to the question : Why is the data not visible within the data view itself?
Explanation :
If the data source is not of "DB" type, when dropped into an analysis you will see data from only one partition within the data store. When writing out the record to that data store, the max number of partitions may be set to 1, ensuring everything will be pulled into the preview.
This scenario describes a common problem with a solution that is ideal when dealing with low volumes for testing. Changing the number of output partitions may have an effect on how much the analysis parallelizes data processes. This is not limited to the analysis producing the data. An analysis reading the data would need an explicit repartitions node to split, utilizing parallel processing ( if dealing with large volumes ).
Comments
0 comments
Please sign in to leave a comment.