High Availability
Temporal Cloud provides a 99.9% contractual Service Level Agreement (SLA) guarantee against service errors for all Namespaces. High Availability asynchronously replicates Workflows across multiple isolation domains for a 99.99% SLA (10x the default).
High Availability features
When you enable High Availability features, Temporal deploys your primary and its replica across separate isolation domains. You control the location of both the primary and the replica.
Deployment | Description |
---|---|
Same‑region Replication | Isolation domains are co-located within the same region. |
Multi‑region Replication | Isolation domains are located in separate regions. |
Multi‑cloud Replication | Isolation domains are located in separate cloud providers. |
Same-region Replication
Temporal replicates Namespaces across isolation domains within one region. This option is a good fit when your application is built for one region and you prefer to failover within that region. This provides a reliable failover mechanism while maintaining deployment simplicity.
Multi-region Replication
Temporal replicates Namespaces across regions, making sure Workflows and data are available even if a region fails. Asynchronous replication means changes aren’t immediately reflected in other regions but will sync over time, ensuring data integrity. This setup allows failovers between replicas without needing immediate consistency across regions. Replication across different regions enhances resilience and reliability.
Multi-cloud Replication
Temporal asynchronously replicates all Workflows (live and historical) and data to a Namespace in an entirely different cloud provider. If a provider outage, regional outage, service disruption, or network issue occurs, traffic automatically shifts to the replica. Replicated data is securely encrypted and transmitted across the public internet between cloud providers. Internet connectivity allows workers in one cloud to fail over to a replica in a different cloud.
When you adopt Temporal's High Availability features, don't forget to consider the reliability of your own workers, infrastructure, and dependencies. Issues like network outages, hardware failures, or misconfigurations in your own systems can affect your application performance.
For the highest level of reliability, distribute your dependencies across regions, and use our Multi-region or Multi-cloud replication features. Using physically separated regions improves the fault tolerance of your application.
See more detail about how replication works.
Failover
In case of an incident or an outage, Temporal will automatically fail over your Namespace from the primary to the replica. This lets Workflow Executions continue with minimal interruptions or data loss. You can also [manually initiate failovers}(/cloud/high-availability/failovers) based on your situational monitoring or for testing.
Returning control from the replica to the primary is called a failback. The replica is active for a brief duration during an incident. After the incident, Temporal fails back to the primary.
See more detail about how failovers work.
SLA for High Availability features
What guarantees does Temporal offer for replication features?
Namespace replication offers 99.99% availability, enforced by Temporal Cloud's service error rates SLA. Our system is designed to limit data loss after recovery when the incident triggering the failover is resolved.
Our recovery point objective (RPO) is near-zero. There may be a short period of time during an incident or forced failover when some data is unavailable in the replica region. Some Workflow History data won't arrive until network issues are fixed, enabling the History to finish replicating and the divergent History branches to reconcile.
Temporal Cloud proactively responds to incidents by triggering failovers. Our recovery time objective (RTO) is 20 minutes or less per incident.
During a disaster scenario in which the data in the primary Namespace cannot be recovered, the duration of data loss may be as high as the replication lag at the time of disaster.