API rate limiting and query complexity
Query complexity
In the context of the GraphQL API, query complexity refers to a measure of how complicated a specific GraphQL request is. By measuring query complexity we acknowledge that not all requests will require the same amount of effort to process. Ideally, query complexity is correlated with the effort the server requires to respond to the request, so restrictions on query complexity can be used to manage server performance.
In Totara, we assign complexity points to specific features of GraphQL requests, and use that to calculate the overall complexity of requests.
The point allocation for a request's complexity is as follows:
Request feature | Complexity points |
---|---|
Mutation | 10 |
Query | 5 |
Single field from a Type | 1 per record returned |
So for example, the following request:
{ totara_webapi_status { status timestamp } }
would have a complexity cost of 7 points (5 for the query, 1 for each field returned), because the status query only returns a single record.
Whereas this request:
{ core_user_users(input: { pagination: { limit: 10 } ) { id firstname lastname }
Would have a complexity of 5 + ( 10 * 3 * 1) = 35 points (assuming there were at least 10 users in the system), 5 for the query, plus 3 for each user (3 fields requested), multiplied by the number of users (10).
If there were fewer users in the system (and therefore fewer results returned), the complexity would drop to 5 + (N * 3).
Obtaining a request's complexity
Turning on performance debugging (via the admin setting perfdebug
) will add information about GraphQL request performance into the response, within the extensions property. See Enabling debugging in GraphQL APIs for details. This includes the total calculated complexity of the query:
{ "data": { "totara_webapi_status": { "status": "ok", "timestamp": "30/09/2022, 15:21" } }, "extensions": { "complexity_data": { "query_complexity": 7 }, "performance_data": { ... } } }
This allows you to understand the complexity of any specific query you execute.
Query depth
Query depth refers to the number of levels of nesting within a specific query. Totara offers a way to restrict the maximum query depth that can be executed.
Query depth is measured as follows:
The named query and name of the query/mutation being executed are not part of the depth calculation. So, this query:
query get_status { totara_webapi_status { status # depth 0 timestamp(format: DATETIMELONG) } }
would have a depth of 0.
The first level of depth is when a type contains a nested subtype. So, this query:
query get_users($query: core_user_users_query) { core_user_users(query: $query) { items { # depth 0 id # depth 1 username firstname lastname email } next_cursor total } }
would have a depth of 1.
Further nesting increments by 1 for each level. So, this mutation:
mutation { totara_job_update_job_assignment(target_job: { idnumber: "exampleja" }, input: { fullname: "Example job assignment updated" }) { job_assignment { # depth 0 id # depth 1 user { fullname # depth 2 custom_fields { shortname # depth 3 } } position { fullname } } } }
has a depth of 3.
Rate limiting
Totara offers rate limiting settings, which are applied to requests to the external API.
Rate limiting is applied at two levels on each request: at the site level and at the client level. Both restrictions independently track their own requests, and a request will fail if either reaches its limit.
Rate limiting is measured in complexity points per minute. See above for a description of how query complexity points are calculated.
Rate limiting is applied as a request is processed. If the limit has been reached before a request starts, it will not begin, but if a request is partway through and the request reaches the limit, it will be blocked and the request will not complete.
Rate limiting is applied using a sliding window algorithm, to allow some sudden bursts of requests while still preventing a sustained high level of requests or excessive bursts at the start of a new window.
Given the following values:
- Preq = Requests in previous window
- Creq = Requests so far in current window
- L = Length of window
- T = Time since start of current window
The current value for the complexity total is as follows:
Complexity total = ( Preq * ( L - T ) / L ) + Creq
As long as this total is less than the defined rate limit then the request will be allowed to start. As the request is processed, calculated complexity points will be added to Creq, and the formula will be recalculated.
When a window has reached the maximum window length (L) it will become the previous window - the number of requests will stop incrementing and a new window will be started with initial complexity points set to zero.
In Totara, the length of window (L) is 1 minute.
Setting rate limits
While we set reasonable defaults to prevent your site being overwhelmed by uncontrolled API requests, determining the most appropriate rate-limit settings to apply to a site is very dependent on the specific circumstances.
Server capacity and configuration, API and other site load, and the 'shape' of that load will all impact server performance. In some cases, you may want to allow heavy API load (for example an overnight cron task), whereas in other situations you may want to enforce strict rate limits on API requests. Therefore it is difficult to give generic advice on appropriate settings.
One approach is to test with an expected load and adjust the rate-limiting settings, looking out for '429 Too many requests' errors in your server logs. If you think your server has additional capacity when requests are being limited, you can increase the limit (if your server is being overwhelmed before the limit is reached then look at lowering it), either for specific clients or globally.