This article applies to Data360 Analyze Linux Server only.
PROBLEM BACKGROUND:
You may come across the following error from time to time in Data360 Analyze:
"Received error message from the Data3Sixty Analyze server during execution: Error in execution Error creating farm: Unable to allocate minimum number of drones. Allocated 0 drones, requested 4 drones, required 1 drones (-1.0% of requested)"
You may find that the error occurs sporadically and that it just affects the current execution and the next one will run fine. This may be due to resource constraints, which delays the allocation of the system resources, but you'll need to check the logs to determine the root cause, more on that below.
TROUBLESHOOT:
Navigate to your log file location and look in the nodeContainer7732.err file. Search for an error that says "Too many open files". If present, it is most likely followed by some classloading errors. This suggests that you've been having issues with FileDescriptor limits, so you'll need to check these limits by running the "ulimit -n" and "ulimit -u" commands.
The output of `ulimit -n` will tell you the max number of open files. Our requirement is for this to be 4096.
The output of `ulimit -u` will tell you the max number of processes allowed, we recommend this to be 2048. It's fine for it to be higher, we just try set it to 2048 as a good base really.
SOLUTION:
For more information on changing the number of open files, please follow this link: How to Increase Number of Open Files Limit in Linux
The launch script attempts to set both of the above, but it's possible it has not worked due to permissions on the Linux user, so they may need to be manually set to the recommended values. The value must be set out in the linux OS configuration. Ordinary users can attempt the command ulimit -n 4096 but often, there is an OS limitation (set by the root user) that must be changed first.
Please note, you should check the ulimit values with the same user that is running Analyze on the machine.
If not, then you'll need to up the limits and restart Analyze by following these steps:
1. Shutdown Analyze
2. Check that no Analyze processes are running, something like `ps -ef | grep analyze` should work assuming the word "analyze" is in the installation directory file path
3. If there are still processes listed associated to Analyze, then manually kill them
4. Change the values
5. Sign out and back in
6. Verify the new limits
7. Launch Analyze, and re-test
So, if the same error happens again, repeat the steps above but before you launch Analyze, double the ulimits (for -n and -u), then launch Analyze and retest.
There was a fix implemented in 3.4.0 to make it so that if not all requested drones could be allocated for whatever reason....as long as 1 drone could be allocated, the execution would proceed. Although this error occurs the execution should be able to continue, however with only a max of 3 parallel nodes running.
NOTE: there is an additional consideration with Analyze compared with our legacy product LAE: not all nodes run as Drones. Many of the newer nodes run as Java tasks, which may or may not be counted in the Drone allocation counts. It's probably safer to consider that the new technology nodes do not count as drones. There is planned work, optimization and improvement in this area in future releases.
If you don't see FileDescriptor errors in the log files please raise a ticket with the Support team.
Comments
0 comments
Please sign in to leave a comment.