Skip to main content

What is a Temporal Cluster?

This guide provides a comprehensive overview of Temporal Clusters.

A Temporal Cluster is the group of services, known as the Temporal Server, combined with Persistence and Visibility stores, that together act as a component of the Temporal Platform.

A Temporal Cluster (Server + persistence)

What is the Temporal Server?

The Temporal Server consists of four independently scalable services:

  • Frontend gateway: for rate limiting, routing, authorizing.
  • History subsystem: maintains data (mutable state, queues, and timers).
  • Matching subsystem: hosts Task Queues for dispatching.
  • Worker Service: for internal background Workflows.

For example, a real-life production deployment can have 5 Frontend, 15 History, 17 Matching, and 3 Worker Services per cluster.

The Temporal Server services can run independently or be grouped together into shared processes on one or more physical or virtual machines. For live (production) environments, we recommend that each service runs independently, because each one has different scaling requirements and troubleshooting becomes easier. The History, Matching, and Worker Services can scale horizontally within a Cluster. The Frontend Service scales differently than the others because it has no sharding or partitioning; it is just stateless.

Each service is aware of the others, including scaled instances, through a membership protocol via Ringpop.

Versions and support

All Temporal Server releases abide by the Semantic Versioning Specification.

We support upgrade paths from every version beginning with Temporal v1.7.0. For details on upgrading your Temporal Cluster, see Upgrade Server.

We provide maintenance support for previously published minor and major versions by continuing to release critical bug fixes related to security, the prevention of data loss, and reliability, whenever they are found.

We aim to publish incremental upgrade guides for each minor and major version, which include specifics about dependency upgrades that we have tested for (such as Cassandra 3.0 -> 3.11).

We offer maintenance support of the last three minor versions after a release and do not plan to "backport" patches beyond that.

We offer maintenance support of major versions for at least 12 months after a GA release, and we provide at least 6 months' notice before EOL/deprecating support.

Dependencies

Temporal offers official support for, and is tested against, dependencies with the exact versions described in the go.mod file of the corresponding release tag. (For example, v1.5.1 dependencies are documented in the go.mod for v1.5.1.)

What is a Frontend Service?

The Frontend Service is a stateless gateway service that exposes a strongly typed Proto API. The Frontend Service is responsible for rate limiting, authorizing, validating, and routing all inbound calls.

Frontend Service

Frontend Service

Types of inbound calls include the following:

Every inbound request related to a Workflow Execution must have a Workflow Id, which is hashed for routing purposes. The Frontend Service has access to the hash rings that maintain service membership information, including how many nodes (instances of each service) are in the Cluster.

Inbound call rate limiting is applied per host and per namespace.

The Frontend Service talks to the Matching Service, History Service, Worker Service, the database, and Elasticsearch (if in use).

  • It uses the grpcPort 7233 to host the service handler.
  • It uses port 6933 for membership-related communication.

Ports are configurable in the Cluster configuration.

What is a History Service?

The History Service is responsible for persisting Workflow Execution state to the Workflow History. When the Workflow Execution is able to progress, the History Service adds a Task with the Workflow's updated history to the Task Queue. From there, a Worker can poll for work, receive this updated history, and resume execution.

Block diagram of how the History Service relates to the other services of the Temporal Server and to a Temporal Cluster

Block diagram of how the History Service relates to the other services of the Temporal Server and to a Temporal Cluster

The total number of History Service processes can be between 1 and the total number of History Shards. An individual History Service can support many History Shards. Temporal recommends starting at a ratio of 1 History Service process for every 500 History Shards.

Although the total number of History Shards remains static for the life of the Cluster, the number of History Service processess can change.

The History Service talks to the Matching Service and the database.

  • It uses grpcPort 7234 to host the service handler.
  • It uses port 6934 for membership-related communication.

Ports are configurable in the Cluster configuration.

What is a History Shard?

A History Shard is an important unit within a Temporal Cluster by which concurrent Workflow Execution throughput can be scaled.

Each History Shard maps to a single persistence partition. A History Shard assumes that only one concurrent operation can be within a partition at a time. In essence, the number of History Shards represents the number of concurrent database operations that can occur for a Cluster. This means that the number of History Shards in a Temporal Cluster plays a significant role in the performance of your Temporal Application.

Before integrating a database, the total number of History Shards for the Temporal Cluster must be chosen and set in the Cluster's configuration (see persistence). After the Shard count is configured and the database integrated, the total number of History Shards for the Cluster cannot be changed.

In theory, a Temporal Cluster can operate with an unlimited number of History Shards, but each History Shard adds compute overhead to the Cluster. Temporal Clusters have operated successfully using anywhere from 1 to 128K History Shards, with each Shard responsible for tens of thousands of Workflow Executions. One Shard is useful only in small scale setups designed for testing, while 128k Shards is useful only in very large scale production environments. The correct number of History Shards for any given Cluster depends entirely on the Temporal Application that it is supporting and the type of database.

A History Shard is represented as a hashed integer. Each Workflow Execution is automatically assigned to a History Shard. The assignment algorithm hashes Workflow Execution metadata such as Workflow Id and Namespace and uses that value to match a History Shard.

Each History Shard maintains the Workflow Execution Event History, Workflow Execution mutable state, and the following internal Task Queues:

  • Internal Transfer Task Queue: Transfers internal tasks to the Matching Service. Whenever a new Workflow Task needs to be scheduled, the History Service's Transfer Task Queue Processor transactionally dispatches it to the Matching Service.
  • Internal Timer Task Queue: Durably persists Timers.
  • Internal Replicator Task Queue: Asynchronously replicates Workflow Executions from active Clusters to other passive Clusters. (Relies on the experimental Multi-Cluster feature.)
  • Internal Visibility Task Queue: Pushes data to the Advanced Visibility index.

What is a Matching Service?

The Matching Service is responsible for hosting user-facing Task Queues for Task dispatching.

Matching Service

Matching Service

It is responsible for matching Workers to Tasks and routing new Tasks to the appropriate queue. This service can scale internally by having multiple instances.

It talks to the Frontend Service, History Service, and the database.

  • It uses grpcPort 7235 to host the service handler.
  • It uses port 6935 for membership related communication.

Ports are configurable in the Cluster configuration.

What is a Worker Service?

The Worker Service runs background processing for the replication queue, system Workflows, and (in versions older than 1.5.0) the Kafka visibility processor.

Worker Service

Worker Service

It talks to the Frontend Service.

  • It uses port 6939 for membership-related communication.

Ports are configurable in the Cluster configuration.

What is a Retention Period?

Retention Period is the duration for which the Temporal Cluster stores data associated with closed Workflow Executions on a Namespace in the Persistence store.

A Retention Period applies to all closed Workflow Executions within a Namespace and is set when the Namespace is registered.

The Temporal Cluster triggers a Timer task at the end of the Retention Period that cleans up the data associated with the closed Workflow Execution on that Namespace.

The minimum Retention Period is 1 day. On Temporal Cluster version 1.18 and later, the maximum Retention Period value for Namespaces can be set to anything over the minimum requirement of 1 day. Ensure that your Persistence store has enough capacity for the storage. On Temporal Cluster versions 1.17 and earlier, the maximum Retention Period you can set is 30 days. Setting the Retention Period to 0 results in the error A valid retention period is not set on request.

If you don't set the Retention Period value when using the tctl namespace register command, it defaults to 3 days. If you don't set the Retention Period value when using the Register Namespace Request API, it returns an error.

When changing the Retention Period, the new duration applies to Workflow Executions that close after the change is saved.

What is Persistence?

The Temporal Persistence store is a database used by Temporal Services to persist events generated and processed in your Temporal Cluster and SDK.

A Temporal Cluster's only required dependency for basic operation is the Persistence database. Multiple types of databases are supported.

Persistence

Persistence

The database stores the following types of data:

  • Tasks: Tasks to be dispatched.
  • State of Workflow Executions:
    • Execution table: A capture of the mutable state of Workflow Executions.
    • History table: An append-only log of Workflow Execution History Events.
  • Namespace metadata: Metadata of each Namespace in the Cluster.
  • Visibility data: Enables operations like "show all running Workflow Executions". For production environments, we recommend using Elasticsearch as your Visibility store.

An Elasticsearch database must be configured in a self-hosted Cluster to enable advanced Visibility on Temporal Server versions 1.19.1 and earlier.

With Temporal Server version 1.20 and later, advanced Visibility features are available on SQL databases like MySQL (version 8.0.17 and later), PostgreSQL (version 12 and later), SQLite (v3.31.0 and later), and Elasticsearch.

Dependency versions

Temporal tests compatibility by spanning the minimum and maximum stable major versions for each supported database. The following versions are used in our test pipelines and actively tested before we release any version of Temporal:

  • Cassandra v3.11 and v4.0
  • PostgreSQL v10.18 and v13.4
  • MySQL v5.7 and v8.0 (specifically 8.0.19+ due to a bug)

You can verify supported databases in the Temporal Server release notes.

  • Because Temporal Server primarily relies on core database functionality, we do not expect compatibility to break often.
  • We do not run tests with vendors like Vitess and CockroachDB.
  • Temporal also supports SQLite v3.x persistence, but this is meant only for development and testing, not production usage.

What is Visibility?

Support, stability, and dependency info
  • For Temporal Server v1.19 and earlier, all supported databases for Visibility provide standard Visibility features, and an Elasticsearch database is required for advanced Visibility features.
  • For Temporal Server v1.20 and later, advanced Visibility features are enabled on all supported SQL databases, in addition to Elasticsearch.
  • In Temporal Server v1.21 and later, standard Visibility is no longer in development, and we recommend migrating to a database that supports Advanced Visibility features. The Visibility configuration for Temporal Clusters has been updated and Dual Visibility is enabled. For details, see Visibility store setup.

The term Visibility, within the Temporal Platform, refers to the subsystems and APIs that enable an operator to view, filter, and search for Workflow Executions that currently exist within a Cluster.

The Visibility store in your Temporal Cluster stores persisted Workflow Execution Event History data and is set up as a part of your Persistence store to enable listing and filtering details about Workflow Executions that exist on your Temporal Cluster.

With Temporal Server v1.21, you can set up Dual Visibility to migrate your Visibility store from one database to another.

What is Archival?

Archival is a feature that automatically backs up Event Histories and Visibility records from Temporal Cluster persistence to a custom blob store.

Workflow Execution Event Histories are backed up after the Retention Period is reached. Visibility records are backed up immediately after a Workflow Execution reaches a Closed status.

Archival enables Workflow Execution data to persist as long as needed, while not overwhelming the Cluster's persistence store.

This feature is helpful for compliance and debugging.

Temporal's Archival feature is considered experimental and not subject to normal versioning and support policy.

Archival is not supported when running Temporal through Docker and is disabled by default when installing the system manually and when deploying through helm charts (but can be enabled in the config).

What is Cluster configuration?

Cluster configuration is the setup and configuration details of your self-hosted Temporal Cluster, defined using YAML. You must define your Cluster configuration when setting up your self-hosted Temporal Cluster.

For details on using Temporal Cloud, see Temporal Cloud documentation.

Cluster configuration is composed of two types of configuration: Static configuration and Dynamic configuration.

Static configuration

Static configuration contains details of how the Cluster should be set up. The static configuration is read just once and used to configure service nodes at startup. Depending on how you want to deploy your self-hosted Temporal Cluster, your static configuration must contain details for setting up:

  • Temporal Services—Frontend, History, Matching, Worker
  • Membership ports for the Temporal Services
  • Persistence (including History Shard count), Visibility, Archival store setups.
  • TLS, authentication, authorization
  • Server log level
  • Metrics
  • Cluster metadata
  • Dynamic config Client

Static configuration values cannot be changed at runtime. Some values, such as the Metrics configuration or Server log level can be changed in the static configuration but require restarting the Cluster for the changes to take effect.

For details on static configuration keys, see Cluster configuration reference.

For static configuration examples, see https://github.com/temporalio/temporal/tree/master/config.

Dynamic configuration

Dynamic configuration contains configuration keys that you can update in your Cluster setup without having to restart the server processes.

All dynamic configuration keys provided by Temporal have default values that are used by the Cluster. You can override the default values by setting different values for the keys in a YAML file and setting the dynamic configuration client to poll this file for updates. Setting dynamic configuration for your Cluster is optional.

Setting overrides for some configuration keys updates the Cluster configuration immediately. However, for configuration fields that are checked at startup (such as thread pool size), you must restart the server for the changes to take effect.

Use dynamic configuration keys to fine-tune your self-deployed Cluster setup.

For details on dynamic configuration keys, see Dynamic configuration reference.

For dynamic configuration examples, see https://github.com/temporalio/temporal/tree/master/config/dynamicconfig.

What is Cluster security configuration?

Secure your Temporal Cluster (self-hosted and Temporal Cloud) by encrypting your network communication and setting authentication and authorization protocols for API calls.

For details on setting up your Temporal Cluster security, see Temporal Platform security features.

mTLS encryption

Temporal supports Mutual Transport Layer Security (mTLS) to encrypt network traffic between services within a Temporal Cluster, or between application processes and a Cluster.

On self-hosted Temporal Clusters, configure mTLS in the tls section of the Cluster configuration. mTLS configuration is a static configuration property.

You can then use either the WithConfig or WithConfigLoader server option to start your Temporal Cluster with this configuration.

The mTLS configuration includes two sections that serve to separate communication within a Temporal Cluster and client calls made from your application to the Cluster.

  • internode: configuration for encrypting communication between nodes within the Cluster.
  • frontend: configuration for encrypting the public endpoints of the Frontend Service.

Setting mTLS for internode and frontend separately lets you use different certificates and settings to encrypt each section of traffic.

Using certificates for Client connections

Use CA certificates to authenticate client connections to your Temporal Cluster.

On Temporal Cloud, you can set your CA certificates in your Temporal Cloud settings and use the end-entity certificates in your client calls.

On self-hosted Temporal Clusters, you can restrict access to Temporal Cluster endpoints by using the clientCAFiles or clientCAData property and the requireClientAuth property in your Cluster configuration. These properties can be specified in both the internode and frontend sections of the mTLS configuration. For details, see the tls configuration reference.

Server name specification

On self-hosted Temporal Clusters, you can specify serverName in the client section of your mTLS configuration to prevent spoofing and MITM attacks.

Entering a value for serverName enables established connections to authenticate the endpoint. This ensures that the server certificate presented to any connected client has the specified server name in its CN property.

This measure can be used for internode and frontend endpoints.

For more information on mTLS configuration, see tls configuration reference.

Authentication and authorization

Temporal provides authentication interfaces that can be set to restrict access to your data. These protocols address three areas: servers, client connections, and users.

Temporal offers two plugin interfaces for authentication and authorization of API calls.

The logic of both plugins can be customized to fit a variety of use cases. When plugins are provided, the Frontend Service invokes their implementation before running the requested operation.

What is Cluster observability?

You can monitor and observe performance with metrics emitted by your self-hosted Temporal Cluster or by Temporal Cloud.

Temporal emits metrics by default in a format that is supported by Prometheus. Any metrics software that supports the same format can be used. Currently, we test with the following Prometheus and Grafana versions:

  • Prometheus >= v2.0
  • Grafana >= v2.5

Temporal Cloud emits metrics through a Prometheus HTTP API endpoint, which can be directly used as a Prometheus data source in Grafana or to query and export Cloud metrics to any observability platform.

For details on Cloud metrics and setup, see the following:

On self-hosted Temporal Clusters, expose Prometheus endpoints in your Cluster configuration and configure Prometheus to scrape metrics from the endpoints. You can then set up your observability platform (such as Grafana) to use Prometheus as a data source.

For details on self-hosted Cluster metrics and setup, see the following:

What is Multi-Cluster Replication?

Multi-Cluster Replication is a feature which asynchronously replicates Workflow Executions from active Clusters to other passive Clusters, for backup and state reconstruction. When necessary, for higher availability, Cluster operators can failover to any of the backup Clusters.

Temporal's Multi-Cluster Replication feature is considered experimental and not subject to normal versioning and support policy.

Temporal automatically forwards Start, Signal, and Query requests to the active Cluster. This feature must be enabled through a Dynamic Config flag per Global Namespace.

When the feature is enabled, Tasks are sent to the Parent Task Queue partition that matches that Namespace, if it exists.

All Visibility APIs can be used against active and standby Clusters. This enables Temporal UI to work seamlessly for Global Namespaces. Applications making API calls directly to the Temporal Visibility API continue to work even if a Global Namespace is in standby mode. However, they might see a lag due to replication delay when querying the Workflow Execution state from a standby Cluster.

Namespace Versions

A version is a concept in Multi-Cluster Replication that describes the chronological order of events per Namespace.

With Multi-Cluster Replication, all Namespace change events and Workflow Execution History events are replicated asynchronously for high throughput. This means that data across clusters is not strongly consistent. To guarantee that Namespace data and Workflow Execution data will achieve eventual consistency (especially when there is a data conflict during a failover), a version is introduced and attached to Namespaces. All Workflow Execution History entries generated in a Namespace will also come with the version attached to that Namespace.

All participating Clusters are pre-configured with a unique initial version and a shared version increment:

  • initial version < shared version increment

When performing failover for a Namespace from one Cluster to another Cluster, the version attached to the Namespace will be changed by the following rule:

  • for all versions which follow version % (shared version increment) == (active cluster's initial version), find the smallest version which has version >= old version in namespace

When there is a data conflict, a comparison will be made and Workflow Execution History entries with the highest version will be considered the source of truth.

When a cluster is trying to mutate a Workflow Execution History, the version will be checked. A cluster can mutate a Workflow Execution History only if the following is true:

  • The version in the Namespace belongs to this cluster, i.e. (version in namespace) % (shared version increment) == (this cluster's initial version)
  • The version of this Workflow Execution History's last entry (event) is equal or less than the version in the Namespace, i.e. (last event's version) <= (version in namespace)

Namespace version change example

Assuming the following scenario:

  • Cluster A comes with initial version: 1
  • Cluster B comes with initial version: 2
  • Shared version increment: 10

T = 0: Namespace α is registered, with active Cluster set to Cluster A

namespace α's version is 1
all workflows events generated within this namespace, will come with version 1

T = 1: namespace β is registered, with active Cluster set to Cluster B

namespace β's version is 2
all workflows events generated within this namespace, will come with version 2

T = 2: Namespace α is updated to with active Cluster set to Cluster B

namespace α's version is 2
all workflows events generated within this namespace, will come with version 2

T = 3: Namespace β is updated to with active Cluster set to Cluster A

namespace β's version is 11
all workflows events generated within this namespace, will come with version 11

Version history

Version history is a concept which provides a high level summary of version information in regards to Workflow Execution History.

Whenever there is a new Workflow Execution History entry generated, the version from Namespace will be attached. The Workflow Executions's mutable state will keep track of all history entries (events) and the corresponding version.

Version history example (without data conflict)

  • Cluster A comes with initial version: 1
  • Cluster B comes with initial version: 2
  • Shared version increment: 10

T = 0: adding event with event ID == 1 & version == 1

View in both Cluster A & B

| -------- | ------------- | --------------- | ------- |
| Events | Version History |
| -------- | --------------- | --------------- | ------- |
| Event ID | Event Version | Event ID | Version |
| -------- | ------------- | --------------- | ------- |
| 1 | 1 | 1 | 1 |
| -------- | ------------- | --------------- | ------- |

T = 1: adding event with event ID == 2 & version == 1

View in both Cluster A & B

| -------- | ------------- | --------------- | ------- |
| Events | Version History |
| -------- | --------------- | --------------- | ------- |
| Event ID | Event Version | Event ID | Version |
| -------- | ------------- | --------------- | ------- |
| 1 | 1 | 2 | 1 |
| 2 | 1 | | |
| -------- | ------------- | --------------- | ------- |

T = 2: adding event with event ID == 3 & version == 1

View in both Cluster A & B

| -------- | ------------- | --------------- | ------- |
| Events | Version History |
| -------- | --------------- | --------------- | ------- |
| Event ID | Event Version | Event ID | Version |
| -------- | ------------- | --------------- | ------- |
| 1 | 1 | 3 | 1 |
| 2 | 1 | | |
| 3 | 1 | | |
| -------- | ------------- | --------------- | ------- |

T = 3: Namespace failover triggered, Namespace version is now 2 adding event with event ID == 4 & version == 2

View in both Cluster A & B

| -------- | ------------- | --------------- | ------- |
| Events | Version History |
| -------- | --------------- | --------------- | ------- |
| Event ID | Event Version | Event ID | Version |
| -------- | ------------- | --------------- | ------- |
| 1 | 1 | 3 | 1 |
| 2 | 1 | 4 | 2 |
| 3 | 1 | | |
| 4 | 2 | | |
| -------- | ------------- | --------------- | ------- |

T = 4: adding event with event ID == 5 & version == 2

View in both Cluster A & B

| -------- | ------------- | --------------- | ------- |
| Events | Version History |
| -------- | --------------- | --------------- | ------- |
| Event ID | Event Version | Event ID | Version |
| -------- | ------------- | --------------- | ------- |
| 1 | 1 | 3 | 1 |
| 2 | 1 | 5 | 2 |
| 3 | 1 | | |
| 4 | 2 | | |
| 5 | 2 | | |
| -------- | ------------- | --------------- | ------- |

Since Temporal is AP, during failover (change of active Temporal Cluster Namespace), there can exist cases where more than one Cluster can modify a Workflow Execution, causing divergence of Workflow Execution History. Below shows how the version history will look like under such conditions.

Version history example (with data conflict)

Below, shows version history of the same Workflow Execution in 2 different Clusters.

  • Cluster A comes with initial version: 1
  • Cluster B comes with initial version: 2
  • Cluster C comes with initial version: 3
  • Shared version increment: 10

T = 0:

View in both Cluster B & C

| -------- | ------------- | --------------- | ------- |
| Events | Version History |
| -------- | --------------- | --------------- | ------- |
| Event ID | Event Version | Event ID | Version |
| -------- | ------------- | --------------- | ------- |
| 1 | 1 | 2 | 1 |
| 2 | 1 | 3 | 2 |
| 3 | 2 | | |
| -------- | ------------- | --------------- | ------- |

T = 1: adding event with event ID == 4 & version == 2 in Cluster B

| -------- | ------------- | --------------- | ------- |
| Events | Version History |
| -------- | --------------- | --------------- | ------- |
| Event ID | Event Version | Event ID | Version |
| -------- | ------------- | --------------- | ------- |
| 1 | 1 | 2 | 1 |
| 2 | 1 | 4 | 2 |
| 3 | 2 | | |
| 4 | 2 | | |
| -------- | ------------- | --------------- | ------- |

T = 1: namespace failover to Cluster C, adding event with event ID == 4 & version == 3 in Cluster C

| -------- | ------------- | --------------- | ------- |
| Events | Version History |
| -------- | --------------- | --------------- | ------- |
| Event ID | Event Version | Event ID | Version |
| -------- | ------------- | --------------- | ------- |
| 1 | 1 | 2 | 1 |
| 2 | 1 | 3 | 2 |
| 3 | 2 | 4 | 3 |
| 4 | 3 | | |
| -------- | ------------- | --------------- | ------- |

T = 2: replication task from Cluster C arrives in Cluster B

Note: below are a tree structures

                | -------- | ------------- |
| Events |
| ------------- | ------------- |
| Event ID | Event Version |
| -------- | ------------- |
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| -------- | ------------- |
| |
| ------------- | ------------ |
| |
| -------- | ------------- | | -------- | ------------- |
| Event ID | Event Version | | Event ID | Event Version |
| -------- | ------------- | | -------- | ------------- |
| 4 | 2 | | 4 | 3 |
| -------- | ------------- | | -------- | ------------- |

| --------------- | ------- |
| Version History |
| --------------- | ------------------- |
| Event ID | Version |
| --------------- | ------- |
| 2 | 1 |
| 3 | 2 |
| --------------- | ------- |
| |
| ------- | ------------------- |
| |
| --------------- | ------- | | --------------- | ------- |
| Event ID | Version | | Event ID | Version |
| --------------- | ------- | | --------------- | ------- |
| 4 | 2 | | 4 | 3 |
| --------------- | ------- | | --------------- | ------- |

T = 2: replication task from Cluster B arrives in Cluster C, same as above

Conflict resolution

When a Workflow Execution History diverges, proper conflict resolution is applied.

In Multi-cluster Replication, Workflow Execution History Events are modeled as a tree, as shown in the second example in Version History.

Workflow Execution Histories that diverge will have more than one history branch. Among all history branches, the history branch with the highest version is considered the current branch and the Workflow Execution's mutable state is a summary of the current branch. Whenever there is a switch between Workflow Execution History branches, a complete rebuild of the Workflow Execution's mutable state will occur.

Temporal Multi-Cluster Replication relies on asynchronous replication of Events across Clusters, so in the case of a failover it is possible to have an Activity Task dispatched again to the newly active Cluster due to a replication task lag. This also means that whenever a Workflow Execution is updated after a failover by the new Cluster, any previous replication tasks for that Execution cannot be applied. This results in loss of some progress made by the Workflow Execution in the previous active Cluster. During such conflict resolution, Temporal re-injects any external Events like Signals in the new Event History before discarding replication tasks. Even though some progress could roll back during failovers, Temporal provides the guarantee that Workflow Executions won’t get stuck and will continue to make forward progress.

Activity Execution completions are not forwarded across Clusters. Any outstanding Activities will eventually time out based on the configuration. Your application should have retry logic in place so that the Activity gets retried and dispatched again to a Worker after the failover to the new Cluster. Handling this is similar to handling an Activity Task timeout caused by a Worker restarting.

Zombie Workflows

There is an existing contract that for any Namespace and Workflow Id combination, there can be at most one run (Namespace + Workflow Id + Run Id) open / executing.

Multi-cluster Replication aims to keep the Workflow Execution History as up-to-date as possible among all participating Clusters.

Due to the nature of Multi-cluster Replication (for example, Workflow Execution History events are replicated asynchronously) different Runs (same Namespace and Workflow Id) can arrive at the target Cluster at different times, sometimes out of order, as shown below:

| ------------- |          | ------------- |          | ------------- |
| Cluster A | | Network Layer | | Cluster B |
| --------- || ------------- | | ------------- |
| | |
| Run 1 Replication Events | |
| -----------------------> | |
| | |
| Run 2 Replication Events | |
| -----------------------> | |
| | |
| | |
| | |
| | Run 2 Replication Events |
| | -----------------------> |
| | |
| | Run 1 Replication Events |
| | -----------------------> |
| | |
| --- || ------------- | | ------------- |
| Cluster A | | Network Layer | | Cluster B |
| --------- || ------------- | | ------------- |

Because Run 2 appears in Cluster B first, Run 1 cannot be replicated as "runnable" due to the rule at most one Run open (see above), thus the "zombie" Workflow Execution state is introduced. A "zombie" state is one in which a Workflow Execution which cannot be actively mutated by a Cluster (assuming the corresponding Namespace is active in this Cluster). A zombie Workflow Execution can only be changed by a replication Task.

Run 1 will be replicated similar to Run 2, except when Run 1's execution will become a "zombie" before Run 1 reaches completion.

Workflow Task processing

In the context of Multi-cluster Replication, a Workflow Execution's mutable state is an entity which tracks all pending tasks. Prior to the introduction of Multi-cluster Replication, Workflow Execution History entries (events) are from a single branch, and the Temporal Server will only append new entries (events) to the Workflow Execution History.

After the introduction of Multi-cluster Replication, it is possible that a Workflow Execution can have multiple Workflow Execution History branches. Tasks generated according to one history branch may become invalidated by switching history branches during conflict resolution.

Example:

T = 0: task A is generated according to Event Id: 4, version: 2

| -------- | ------------- |
| Events |
| -------- | ------------- |
| Event ID | Event Version |
| -------- | ------------- |
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| -------- | ------------- |
| |
| |
| -------- | ------------- |
| Event ID | Event Version |
| -------- | ------------- |
| 4 | 2 | <-- task A belongs to this event |
| -------- | ------------- |

T = 1: conflict resolution happens, Workflow Execution's mutable state is rebuilt and history Event Id: 4, version: 3 is written down to persistence

| -------- | ------------- |
| Events |
| ------------- | -------------------------------------------- |
| Event ID | Event Version |
| -------- | ------------- |
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| -------- | ------------- |
| |
| ------------- | -------------------------------------------- |
| |
| -------- | ------------- | | -------- | ------------- |
| Event ID | Event Version | | Event ID | Event Version |
| -------- | ------------- | | -------- | ------------- |
| 4 | 2 | <-- task A belongs to this event | 4 | 3 | <-- current branch / mutable state |
| -------- | ------------- | | -------- | ------------- |

T = 2: task A is loaded.

At this time, due to the rebuild of a Workflow Execution's mutable state (conflict resolution), Task A is no longer relevant (Task A's corresponding Event belongs to non-current branch). Task processing logic will verify both the Event Id and version of the Task against a corresponding Workflow Execution's mutable state, then discard task A.