This article tells you which components of the application require the most space. As each Analyze environment is different, it is very difficult for us to say exactly how much space is required - this article aims to give you an understanding of how storage is used by the Analyze server.
BACKGROUND:
You can install Analyze wherever you like provided the directories are accessible to the Analyze user.
The required disk space for the application files is about ~3GB. It is important to ensure that your system meets the minimum requirements in terms of storage availability, and the type of storage.
STORAGE REQUIREMENTS:
1. Space required for /site-7731, which holds:
conf files...small space requirement
library files... this is a bit bigger than conf, as it holds the Java executables / drivers, the /ext and /spark directory
log files....these require the most space and will continue to grow as the application is used unless purged
2. Space is required for /data-3371, which holds:
backups...you should consider backing up the .lxp files and placing them somewhere else for safekeeping.
It is also recommended that you regularly do a full Linux / Windows system backup.
/pgsql....low latency is required for database transactions otherwise it will impede the performance of the application. For example, when a user clicks to open a data flow the application will retrieve information from the database and this should not be queued behind other bulk I/O operations (e.g. writing temp data)
/executions...requires a large amount of data. It's important that not all of the I/O bandwidth is used by read/writes to/from this folder (i.e. for storing and accessing intermediate data created and used by the nodes), as this may cause delays when communication is needed with the /pgsql and web application /webapp folders. If /executions is on the same disk as the rest of data-7731 and it fills up, then this could lead to corruption of the application.
As the /executions folder stores the intermediate data used by the nodes, this folder in particular needs good I/O bandwidth, and should be tuned to maximize it. The storage should be tuned to optimize the performance given the typical file sizes generated by the data you are processing
Therefore, it is recommended that you put /executions on a different disk / drive. For instructions on how to mount a drive, see here <insert link to other KB on mounting>
IMPORTANT NOTE: do not delete the /executions/cache whilst configuring the folder to a new location (this folder is used for in-node Java compilation tasks & Python module caching)
/webapp...this directory stores the web application caches so it needs to be fast storage in order for the API calls to complete in a timely manner
If these folders don't have ample room / fast storage then the whole performance of the application could suffer. You may see things like delays in adding nodes to the canvas, updating node properties, etc
If the executions, pgsql and webapp folders can't read write then this may lead to corruption.
TYPE OF STORAGE:
We recommend the use of local storage for the /data-7731 directory with the exception of the executions folder. The executions folder should provide sufficient capacity to store the temp data being generated by the application on an ongoing basis.
Factors that contribute to disk space issues:
- big (in bytes) input files or databases (e.g. 10s GB - TBs)
- how many nodes the data passes thru
- how many dataflows are running at one time
- the amount of previous executions retained
If the above are all high, then more space is required.
Comments
0 comments
Please sign in to leave a comment.