Skip to main content

Workflows

When you start a Workflow, you can pass along parameters that tell the Temporal Server how to handle the Workflow. This includes the ability to set timeouts for Workflow execution, a Retry Policy, the Task Queue name, a data converter, search attributes, and Child Workflow options.

Timeout settings#

It's sometimes necessary to limit the amount of time that a specific Workflow can run. Though, unlike Activities, Workflow timeouts are available primarily to protect the system from "runaway" Workflows that may end up consuming too many resources, and not intended to be used as a part of the business logic. There are a few important things to consider with Workflow timeout settings:

  1. When a Workflow times out, it is terminated without any notifications available to another application.
  2. You should always account for possible outages, such that if your Workers go down for an hour, all of your Workflows won't time out. Start with infinite timeouts.
  3. The SDKs come equipped with timers and sleep APIs that can be used directly inside of Workflows to handle business logic related timeouts.

Execution timeout#

  • Description: This is the maximum amount of time that a Workflow should be allowed to run including retries and any usage of the "Continue-as-new" feature. The default value is set to 10 years. This is different from Run timeout.
  • Use-case: This is most commonly used for stopping the execution of a cron scheduled Workflow after a certain amount of time has passed.

Run timeout#

  • Description: This is the maximum amount of time that a single Workflow run is restricted to. The default is set to the same value as the Execution timeout.
  • Use-case: This is most commonly used to limit the execution time of a single cron scheduled Workflow invocation. If this timeout is reached and there is an associated Retry Policy, the Workflow will be retried before any scheduling occurs. If there is no Retry Policy then the Workflow will be scheduled per the cron schedule.

Task timeout#

  • Description: This is the maximum amount of time that the Server will wait for the Worker to start processing a Workflow Task after the Task has been pulled from the Task Queue. The default value is 10 seconds.
  • Use-case: This is primarily available to recognize whether a Worker has gone down so that the Workflow can be recovered and continue executing on a different Worker. The main reason for increasing the default value would be to accommodate a Workflow that has a very long event history that could take longer than 10 seconds for the Worker to load.

Retry Policy#

There may be scenarios where you need to retry a Workflow's execution from the very beginning. In this case, you can supply a Retry Policy when you start the Workflow. However, the intention is that Workflows are written such that they would never fail on intermittent issues. Activities are made available to handle that kind of logic, and thus retrying Workflows is rare. The exceptions tend to be cron scheduled Workflows or some other stateless always-running Workflows that benefit from retries.

note

Retry Policies are not required when starting a Workflow. If one is not provided, a default one is generated for the Workflow. However, if one is provided, the only required option is the initial interval.

Initial interval#

  • Description: Amount of time that must elapse before the first retry occurs. There is no default value and one must be supplied if a Retry Policy is provided.
  • Use-case: This is used as the base interval time for the backoff coefficient to multiply against.

Backoff coefficient#

  • Description: Retries can occur exponentially. The backoff coefficient specifies how fast the retry interval will grow. The default value is set to 2.0. A backoff coefficient of 1.0 means that the retry interval will always equal the initial interval.
  • Use-case: Use this to grow the interval between retries. By having a backoff coefficient, the first few retries happens relatively quickly to overcome intermittent failures, but subsequent retries will happen farther and farther apart to account for longer lasting outages. Use the maximum interval option to prevent the coefficient from growing the retry interval too much.

Maximum interval#

  • Description: Specifies the maximum interval between retries. The default is 100x that of initial interval.
  • Use-case: This is useful for coefficients greater than 1.0 as it prevents the interval from growing exponentially infinitely.

Maximum attempts#

  • Description: Specifies the maximum number of attempts that can be made to execute a Workflow in the presence of failures. If this limit is exceeded, the Workflow fails without retrying again. The default is unlimited. Setting it to 0 also means unlimited.
  • Use-case: This can be used to ensure that retries do not continue indefinitely. However, in the majority of cases, we recommend relying on the execution timeout to limit the duration of the retries instead of this.

Non-retryable error reasons#

  • Description: Specifies errors that shouldn't be retried.
  • Use-case: There may be errors that you know of that should not trigger a retry. In this case you can specify them such that if they occur, the Workflow will not be retried.

The Task Queue#

The only required Workflow options parameter is the name of a Task Queue. Read the Task Queues concept page for a better overview.

Essentially, a Task Queue is a mechanism where any given Worker knows which piece of code to execute next. A Workflow can only use one Task Queue, just as a Worker can only subscribe to a single Task Queue. From a developer's perspective, it is named and managed as a simple string value.

Workflow Id#

You may assign a custom Workflow Id to a Workflow. This Id is meant for business level identification such as a customer Id or an order Id. The Temporal Server enforces the uniqueness of the Id, within a Namespace based on the Workflow Id re-use policy.

Any attempt to start a Workflow that has the same Id of a Workflow with a re-use policy that does not allow it, is going to fail with a "Workflow execution already started" error. Note that, it is not possible to have two open Workflows with the same Workflow Id, regardless of the re-use policy. The re-use policy applies only to closed Workflows.

note

A Workflow is uniquely identified by its Namespace, Workflow Id, and Run Id.

Allow duplicate failed only policy#

  • Description: Specifying this means that the Workflow is allowed to start only if a previously executed Workflow with the same Id has failed.
  • Use case: Use this policy when there is a need to re-execute a failed Workflow and guarantee that the successfully completed Workflow will not be re-executed.

Allow duplicate policy#

  • Description: Specifying this means that the Workflow is allowed to start independently of a previous Workflow with the same Id regardless of its completion status. This is the default policy, if one is not specified.
  • Use case: Use this when it is OK to execute a Workflow with the same Workflow Id again.

Reject duplicate policy#

  • Description: Specifying this means that no other Workflow is allowed to start using the same Workflow Id at all.
  • Use case: Use this when there can only be one Workflow execution per Workflow Id within a Namespace retention period.

Cron schedule#

When you specify a cron schedule while starting the Workflow, the Temporal Server will treat the Workflow as a cron job. It is that simple to ensure your Workflow runs on a specific schedule.

The Server only schedules the next run after the current run has completed, failed, or timed out. If a Retry Policy is supplied, and the Workflow fails or timed out, the Workflow will be retried based on the Retry Policy. While the Workflow is retrying, the Server will not schedule the next run. If the next scheduled run is due to occur while the Workflow is still running (or retrying), then the Server will skip that scheduled run. A cron Workflow will not stop until it is terminated or cancelled.

note

Scheduling is based on UTC time.

Search attributes#

When you start a Workflow, you can configure it with search attributes that can be used in complex Workflow visibility search queries. Read the search attributes guide to learn how to enable search attributes in Workflows.

Memos#

You can also attach a non-indexed bit of information to a Workflow, known as a memo, that is visible in Workflow search results.

Child Workflows#

A Child Workflow Execution is a Workflow Execution that is spawned from within another Workflow.

A Workflow Execution can be both a Parent and a Child Workflow Execution because any Workflow can spawn another Workflow.

Parent & Child Workflow Execution entity relationship

Parent & Child Workflow Execution entity relationship

A Parent Workflow Execution must await on the Child Workflow Execution to spawn. The Parent can optionally await on the result of the Child Workflow Execution. Consider the Child's Parent Close Policy if the Parent does not await on the result of the Child, which includes any use of Continue-As-New by the Parent.

When a Parent Workflow Execution reaches a Closed status, the Server propagates Cancellation Requests or Terminations to Child Workflow Executions depending on the Child's Parent Close Policy.

Parent Close Policy entity relationship

Parent Close Policy entity relationship

If a Child Workflow Execution uses Continue-As-New, from the Parent Workflow Execution's perspective the entire chain of Runs is treated as a single execution.

Parent & Child Workflow Execution entity relationship with Continue As New

Parent & Child Workflow Execution entity relationship with Continue As New
  • * = Last Workflow Execution in the chain

When to use Child Workflows#

Consider Workflow Execution Event History size limits.

An individual Workflow Execution has an Event History size limit, which imposes a couple of considerations for using Child Workflows.

On one hand, because Child Workflow Executions have their own Event Histories, they are often used to partition large workloads into smaller chunks. For example, a single Workflow Execution does not have enough space in its Event History to spawn 100,000 Activity Executions. But a Parent Workflow Execution can spawn 1000 Child Workflow Executions that each spawn 1000 Activity Executions to achieve a total of 1,000,000 Activity Executions.

On the other hand, because a Parent Workflow Execution Event History contains Events that correspond to the status of the Child Workflow Execution, a single Parent should not spawn more than 1000 Child Workflow Executions.

In general, however, Child Workflow Executions result in more overall Events recorded in Event Histories than Activities. Because each entry in an Event History is a "cost" in terms of compute resources, this could become a factor in very large workloads. Therefore, we recommend starting with a single Workflow implementation that use Activities until there is a clear need for Child Workflows.

Consider each Child Workflow Execution as a separate service.

Because a Child Workflow Execution can be processed by a completely separate set of Workers than the Parent Workflow Execution, it can act as an entirely separate service. However, this also means that a Parent Workflow Execution and a Child Workflow Execution do not share any local state. As all Workflow Executions, they can communicate only via asynchronous Signals.

Consider that a single Child Workflow Execution can represent a single resource.

As all Workflow Executions, a Child Workflow Execution can create a 1:1 mapping with a resource. For example, a Workflow that manages host upgrades could spawn a Child Workflow Execution per host.

ParentClosePolicy#

When creating a Child Workflow, you can define a ParentClosePolicy that terminates, cancels, or abandons the Workflow Execution if the child's parent stops execution:

  • ABANDON: When the parent stops, don't do anything with the Child Workflow Execution.
  • TERMINATE: When the parent stops, immediately terminate the Child Workflow Execution.
  • REQUEST_CANCEL: When the parent stops, request cancellation on the Child Workflow Execution.

You can set policies per child, which means you can opt out of propagating terminates / cancels on a per-child basis. This is useful for starting Child Workflows asynchronously (see relevant issue here or the corresponding SDK docs).

FAQ#

Is there a limit to how long Workflows can run?

Workflows intended to run indefinitely should be written with some care. Temporal stores the complete event history for the entire lifecycle of a Workflow Execution. There is a maximum limit of 50,000 events that is enforced by the Server, and you should try to avoid getting close to this limit; The Temporal Server puts out a warning at every 10,000 events.

The idiomatic way to handle indefinitely running Workflows is to use the "Continue-as-new" feature, which is available in all SDKs. For example, a reasonable cutoff point might be once a day for high volume Workflows.

The "Continue-as-new" feature completes the current Workflow execution and automatically starts a new execution with the same Workflow Id, but different run Id, passing it the appropriate parameters for it to continue. This keeps the event history within limits, but continues the logic execution.

note

If you are using Signals with the Go SDK, you should make sure to do an asynchronous drain on the Signal channel or the Signals will be lost.

How do I handle a Worker process failure/restart in my Workflow?

You do not. The Workflow code is completely oblivious to any Worker failures or downtime. As soon as the Worker or Temporal Server has recovered, the current state of the Workflow is fully restored and the execution is continued. The only reason a Workflow might fail is due to the Workflow business code throwing an exception, not underlying infrastructure outages.

Can a Worker handle more Workflow instances than its cache size or number of supported threads?

Yes it can. However, the tradeoff is added latency.

Workers are stateless, so any Workflow in a blocked state can be safely removed from a Worker. Later on, it can be resurrected on the same or different Worker when the need arises (in the form of an external event). Therefore, a single Worker can handle millions of open Workflow executions, assuming it can handle the update rate and that a slightly higher latency is not a concern.

How can I load test Workflow Executions?

The Temporal stress testing blog post covers many different scenarios under which we test Workflow Executions.

Get notified of updates