OneAPI

Understanding Rate Limiting

Rate limits throttle the number of API calls that can be made within a specific time frame. The following sections provide a summary of the rate limits used by Zscaler OneAPI.

ZIA API

In Zscaler Internet Access (ZIA) API, every endpoint and action has two different types of rate limit:

A lower bound limit that protects against high bursts of requests over a short period of time.
An upper bound limit that protects against a high volume of requests over a long period of time.

Every endpoint is assigned a weight and every weight has a default rate limit. The following table provides the typical assignment for each weight. However, some endpoints have exceptions, and additionally, specific operations can have a different weight from these typical values. To learn more about the rate limits of individual ZIA API endpoints, see the API Rate Limit Summary.

Weight	Typical Assignment	Requests/second	Requests/minute	Requests/hour
Heavy	DELETE	-	1	4
Medium	POST, PUT	1	-	400
Light	GET	2	-	1000

As a best practice, after each call to an endpoint, your script should include a wait (or sleep) period. For example, if you are programming in Python, you can use the time.sleep() function. When the defined rate limit is exceeded, an HTTP 429 response code is returned.

ZIA sends Rate Limit headers in API responses to provide information about the current status of rate limits, allowing users to plan ahead for their API integration and prevent hitting the rate limit. The Rate Limit headers are:

x-ratelimit-limit: The rate limit ceiling that is applicable for the current API request.
x-ratelimit-remaining: The number of API requests remaining for the current rate limit window.
x-ratelimit-reset: The time (in seconds) remaining in the current window after which the rate limit resets.

ZPA API

For each organization, all Zscaler Private Access (ZPA) API endpoints called from a specific IP address are subjected to the following rate limits:

20 times in a 10-second interval for a GET call
10 times in a 10-second interval for any POST/PUT/DELETE call

All rate limits start as soon as the first call is executed. Calls can occur more than once per second, but no more than the limits for each operation type.

When an API request exceeds the defined rate limit, an HTTP 429 response code is returned along with a response header that includes the retry-after field. This field indicates the time required to wait before the next API call can be made to facilitate the retry mechanism. The following example is a response header that includes the retry-after field and the value 13s indicates 13 seconds of wait period:

{
 "content-type": "application/json",
 "date": "Wed, 6 Mar 2024 11:38 GMT",
 "retry-after": "13s"
}

Zscaler Client Connector API

For each organization, all Zscaler Client Connector API endpoints called from a specific IP address are subjected to a rate limit of 100 calls per hour, except for the /downloadDevices endpoint which has a rate limit of 3 calls per day.

All rate limits start as soon as the first call is executed. Calls can occur more than once per second, but no more than the limits for each operation type.

Zscaler Client Connector sends Rate Limit headers in API responses to provide information about the current status of rate limits, allowing users to plan ahead for their API integration and prevent hitting the rate limit. The Rate Limit headers are:

x-ratelimit-limit: The rate limit ceiling that is applicable for the current API request.
x-ratelimit-remaining: The number of API requests remaining for the current rate limit window.
x-ratelimit-reset: The time (in seconds) remaining in the current window after which the rate limit resets.

Clients subject to rate limits need to back off exponentially to proceed further.

Zscaler Cloud & Branch Connector API

In Cloud & Branch Connector API, every endpoint and operation has two rate limit types:

A lower bound limit that protects against a high volume of requests over a short period of time.
An upper bound limit that protects against a high volume of requests over a long period of time.

Every endpoint has a weight, and every weight has a default rate limit. The following table provides the typical assignment for each weight. However, specific operations can have a different weight from these typical values.

Weight	Typical Assignment	Requests/second	Requests/minute	Requests/hour
Heavy	DELETE	-	1	4
Medium	POST, PUT	1	-	400
Light	GET	2	-	1,000

As a best practice, after each call to an endpoint, your script should include a wait (or sleep) period. For example, in Python, use the time.sleep() function. When the rate limit is exceeded, a 429 HTTP error message is returned with a Retry-After response in the Body. For example:

{
   "message": "Rate Limit (1/SECOND) exceeded",
   "Retry-After": "0 seconds"
}