Service availability - Temporal Cloud

The operating envelope of Temporal Cloud includes throughput, latency, and limits. Service regions are listed on this page. If you need more details, contact us.

Throughput expectations

What kind of throughput can I get with Temporal Cloud?

Each Namespace has a rate limit, which is measured in Actions per second (APS). A Namespace's default limit is set at 400 APS and automatically adjusts based on recent usage (over the prior 7 days). Your throughput limit will never fall below this default value.

When your Action rate exceeds your quota, Temporal Cloud throttles Actions until the rate matches your quota. Throttling means limiting the rate at which Actions are performed to prevent the Namespace from exceeding its APS limit.

Critical calls to external events, such starting or Signaling a Workflow, are always prioritized and never throttled. There are four priority levels for Temporal Cloud API calls:

External events (Critical calls)
Workflow progress updates
Visibility API calls
Cloud operations such as Namespace creation

When you exceed your APS limits, you might receive warnings about throttling. However, requests are never dropped, and high-priority calls are never delayed. Workers might take longer to complete Workflows.

If your usage grows slowly, your throughput limit grows with your usage. At times, you may hit a maximum throughput threshold and need to switch to a higher consumption tier. Learn more about our tiers by visiting our information page or reach out to our team to help size your number of Actions. Temporal Cloud can provide more than 150,000 Actions per second at its highest tier.

MEASURING THROUGHPUT WITH APS AND RPS

APS and RPS are both measures of throughput, but apply to different aspects of Temporal.

APS, or Actions Per Second, is specific to Temporal Cloud. It measures the rate at which Actions, like starting or signaling a Workflow, can be performed in a specific Namespace. Temporal Cloud uses APS to manage and throttle Actions, preventing a Namespace from exceeding its limit. APS measures how many high-level operations (Actions) a user can perform in Temporal Cloud each second.

RPS, or Requests Per Second, is used in the Temporal Service, both in self-hosted Temporal and Temporal Cloud. It measures and controls the rate of gRPC requests to the Service. This is a lower-level measure that manages rates at the service level, such as the Frontend, History, or Matching Services.

In summary, APS is a higher-level measure to limit and mitigate Action spikes in Temporal Cloud. RPS is a lower-level measure to control and balance request rates at the service level.

Latency Service Level Objective (SLO)

What kind of latency can I expect from Temporal Cloud?

Temporal Cloud has a p99 latency SLO of 200ms per region.

In March 2024, latency over a week-long period for starting and signaling Workflow Executions was as follows:

Operation	p90	p99
`StartWorkflowExecution`	24ms	54ms
`SignalWorkflowExecution`	14ms	40ms
`SignalWithStartWorkflowExecution`	24ms	61ms

As Temporal continues working on improving latencies, these numbers will progressively decrease.

The same SLO for normal Worker requests (commands and polling) apply to Nexus in both the caller and handler Namespaces.

Latency observed from the Temporal Client is influenced by other system components like the Codec Server, egress proxy, and the network itself. Also, concurrent operations on the same Workflow Execution may result in higher latency.

Throughput expectations​

Latency Service Level Objective (SLO)​

Throughput expectations

Latency Service Level Objective (SLO)