The purpose of this document is to explain:
- how batch endpoints work in Govern versus non-batch endpoints
- why batch processing is faster in some cases than non-batch
- why the Govern architecture can’t handle small batches
- why is each tenant is limited to calling two API calls to be processed in parallel
WHAT IS BATCH PROCESSING?
Batch processing is the process by which a computer completes groups of jobs, often simultaneously, in non-stop, sequential order.
It’s also a command that ensures large jobs are computed in small parts for efficiency during the debugging process.
Batch processing uses the batch endpoint, and its’ main advantage is that it’s a much simpler way to make requests for multiple items at once. The 'batch' endpoint can handle more than 250 items because this endpoint is designed to store the item list batch processing.
Instead of multiple API calls, this can now all be done with a single call.
Example of a batch POST call:
WHAT IS NON-BATCH PROCESSING?
Non-batching is similar to batching in the sense that you query the API’s, but with a smaller number of items, and uses a non-batch endpoint. If the requests become too large, it is recommended to use the batch endpoints. The batch endpoint is specifically designed for large batches.
Example of non-batch POST call:
The 'non-batch' endpoint has a 250-item limit & doesn't store the item list.
HOW BATCHING WORKS IN GOVERN
Govern is a multi-tenant architecture. For batching, it limits each tenant to 2 batch calls at a time.
Batch calls are queued. It takes Govern up to 30 seconds for the batches to be picked up. Then it performs logic to break down the batch. Therefore, a batch of 2000 is preferable to 2000 individual API calls.
Batch API Endpoints
In Govern, batch API endpoints are used to load large batches of items. The batch API endpoints are asynchronous and are intended for requests that take longer than 90 seconds or so to complete. Typically, you submit a request and check its status after submission. You could theoretically be checking hours later. These are appropriate for greater than 250 items.
Batch endpoints requests are placed in a queue and up to 2 batch endpoints are processed per a Govern tenant at one time. Batch requests are therefore picked up at a slower rate than non-batch.
Batch is limited to 2 simultaneous requests per environment due to the load it generates on the system. Batch endpoints are not appropriate for small requests of say 1 or 2 items. For those, the synchronous (non-batch) endpoints are faster.
The batch process performs its operations in large batch / merge requests. These requests are broken into chunks so if you plan on inserting 20,000 you should not break these down as Govern will do it for you. For example, splitting large batches into small batches defeats the batching logic built in Govern and is not efficient.
Non-Batch API Endpoints
In Govern, non-batch or synchronous API endpoints are used to load less than 250 items at a time. These requests are submitted, and you wait for them to finish. They will have to complete in under 90 seconds for example or they will timeout.
CONSIDERATIONS FOR USING BATCH API'S
- A batch of one or two is inefficient as it has a lot of overhead. Batching in this manner will cause the API queue to become flooded with lots of individual calls to the batch endpoint especially if the calls are to the same asset type. The calls will end up in a pending state as the application is designed to handle 2 calls at a maximum. Therefore, using the batch endpoints in this manner will also be slower.
- As an alternative to using a small batch, it is recommended to use non batch or combine the items into one larger batch of items.
DECIDING BETWEEN BATCH AND NON-BATCH API CALLS
Batch APIs are recommended for large volumes of updates via a single API call, while non-batch API calls are recommended for small volume of individual calls.
Performance is improved when pushing large volumes of data through batch APIs, whereas performance would degrade if pushing large amounts of non-batch API calls.
The reverse is true as well: the performance of non-batch APIs is improved for a small volume of calls, whereas performance would degrade when calling several non-batch APIs concurrently.