Skip to main content

High Availability

Temporal Cloud provides a 99.9% contractual Service Level Agreement (SLA) guarantee against service errors for all Namespaces. High Availability asynchronously replicates Workflows across multiple isolation domains for a 99.99% SLA (10x the default).

Dive deeper — Namespaces and built-in stability[+]

Each standard Temporal Namespace uses replication across three availability zones to ensure high availability. An availability zone is a part of the system where tasks or operations are handled and executed. This design helps manage workloads and ensure tasks are completed. This improves resource use and reduces delays.

Replication makes sure that any changes to Workflow state or History are saved in all three zones before the Temporal Service acknowledges a change back to the Client. As a result, your standard Temporal Namespace stays operational even if one of its three zones becomes unavailable. This provides the basis of our 99.9% service level.


High Availability features

When you enable High Availability features, Temporal deploys your primary and its replica across separate isolation domains. You control the location of both the primary and the replica.

DeploymentDescription
Same‑region ReplicationIsolation domains are co-located within the same region.
Multi‑region ReplicationIsolation domains are located in separate regions.
Multi‑cloud ReplicationIsolation domains are located in separate cloud providers.

Same-region Replication

Temporal replicates Namespaces across isolation domains within one region. This option is a good fit when your application is built for one region and you prefer to failover within that region. This provides a reliable failover mechanism while maintaining deployment simplicity.

Multi-region Replication

Temporal replicates Namespaces across regions, making sure Workflows and data are available even if a region fails. Asynchronous replication means changes aren’t immediately reflected in other regions but will sync over time, ensuring data integrity. This setup allows failovers between replicas without needing immediate consistency across regions. Replication across different regions enhances resilience and reliability.

Multi-cloud Replication

Temporal asynchronously replicates all Workflows (live and historical) and data to a Namespace in an entirely different cloud provider. If a provider outage, regional outage, service disruption, or network issue occurs, traffic automatically shifts to the replica. Replicated data is securely encrypted and transmitted across the public internet between cloud providers. Internet connectivity allows workers in one cloud to fail over to a replica in a different cloud.

caution

When you adopt Temporal's High Availability features, don't forget to consider the reliability of your own workers, infrastructure, and dependencies. Issues like network outages, hardware failures, or misconfigurations in your own systems can affect your application performance.

For the highest level of reliability, distribute your dependencies across regions, and use our Multi-region or Multi-cloud replication features. Using physically separated regions improves the fault tolerance of your application.

See more detail about how replication works.

Failover

In case of an incident or an outage, Temporal will automatically fail over your Namespace from the primary to the replica. This lets Workflow Executions continue with minimal interruptions or data loss. You can also [manually initiate failovers}(/cloud/high-availability/failovers) based on your situational monitoring or for testing.

Returning control from the replica to the primary is called a failback. The replica is active for a brief duration during an incident. After the incident, Temporal fails back to the primary.

See more detail about how failovers work.

SLA for High Availability features

What guarantees does Temporal offer for replication features?

Namespace replication offers 99.99% availability, enforced by Temporal Cloud's service error rates SLA. Our system is designed to limit data loss after recovery when the incident triggering the failover is resolved.

Our recovery point objective (RPO) is near-zero. There may be a short period of time during an incident or forced failover when some data is unavailable in the replica region. Some Workflow History data won't arrive until network issues are fixed, enabling the History to finish replicating and the divergent History branches to reconcile.

Temporal Cloud proactively responds to incidents by triggering failovers. Our recovery time objective (RTO) is 20 minutes or less per incident.

info

During a disaster scenario in which the data in the primary Namespace cannot be recovered, the duration of data loss may be as high as the replication lag at the time of disaster.