Skip to main content

Timeouts and Retry Policy

Timeout settings and a Retry Policy provide fine controls over specific steps of Workflow Executions and Activity Executions.

Workflow Execution with a single Activity Execution: Timeout periods

Workflow Execution with a single Activity Execution: Timeout periods

Workflow Execution with a single Activity Execution: Timeout periods

Workflow Execution Timeout

A Workflow Execution Timeout is the maximum time that a Workflow Execution can be executing (have an Open status) including retries and any usage of Continue As New.

Workflow Execution Timeout period

Workflow Execution Timeout period

The default value is ∞ (infinite). If this timeout is reached, the Workflow Execution changes to a Timed Out status. This timeout is different from the Workflow Run Timeout. This timeout is most commonly used for stopping the execution of a Temporal Cron Job after a certain amount of time has passed.

Workflow Run Timeout

A Workflow Run Timeout is the maximum amount of time that a single Workflow Run is restricted to.

Workflow Run Timeout period

Workflow Run Timeout period

The default is set to the same value as the Workflow Execution Timeout. This timeout is most commonly used to limit the execution time of a single Temporal Cron Job Execution.

If the Workflow Run Timeout is reached, the Temporal Server automatically Terminates the Workflow Execution.

Workflow Task Timeout

A Workflow Task Timeout is the maximum amount of time allowed for a Worker to execute a Workflow Task after the Worker has pulled that Workflow Task from the Task Queue.

Workflow Task Timeout period

Workflow Task Timeout period

The default value is 10 seconds. This timeout is primarily available to recognize whether a Worker has gone down so that the Workflow Execution can be recovered on a different Worker. The main reason for increasing the default value would be to accommodate a Workflow Execution that has a very long Workflow Execution History that could take longer than 10 seconds for the Worker to load.

Schedule-To-Start Timeout

A Schedule To Start Timeout is the maximum amount of time that is allowed, from when an Activity Task is scheduled (placed in a Task Queue) to when a Worker starts executing that Activity Task.

Schedule-To-Start Timeout period

Schedule-To-Start Timeout period

A Retry Policy attached to an Activity Execution retries an Activity Task Execution. Thus the Schedule-To-Start Timeout is applied to each Activity Task Execution within an Activity Execution.

Start-To-Close Timeout period with retries

Start-To-Close Timeout period with retries

There are two primary uses case of this timeout:

  1. Detect whether an individual Worker has crashed.
  2. Detect whether the fleet of Workers polling the Task Queue is not able to keep up with the rate of Activity Tasks.

The default Schedule-To-Start Timeout is ∞ (infinity).

If this timeout is used, we recommend setting this timeout to the maximum time a Workflow Execution is willing to wait for an Activity Execution in the presence of all possible Worker outages, and have a concrete plan in place to reroute Activity Tasks to a different Task Queue. This timeout does not trigger any retries regardless of the Retry Policy, as a retry would place the Activity Task back into the same Task Queue. As a reminder, we do not recommend using this timeout unless you know what you are doing.

In most cases, we recommend monitoring the temporal_activity_schedule_to_start_latency metric to know when Workers are not picking up Activity Tasks, instead of setting this timeout.

Start-To-Close Timeout

A Start-To-Close Timeout is the maximum time allowed for a single Activity Task Execution.

The default Start-To-Close Timeout is the same as the default Schedule-To-Close Timeout.

An Activity Execution must have either this timeout (Start-To-Close) or the Schedule-To-Close Timeout set. We recommend always setting this timeout; however, make sure that it is always set to be longer than the maximum possible time for the Activity Execution to take place. For long running Activity Executions, we recommend also using Activity Heartbeats and Heartbeat Timeouts.

The main use case for the Start-To-Close timeout is to detect when a Worker crashes after it has started executing an Activity Task.

Start-To-Close Timeout period

A Retry Policy attached to an Activity Execution retries an Activity Task Execution. Thus the Start-To-Close Timeout is applied to each Activity Task Execution within an Activity Execution.

If the first Activity Task Execution returns an error the first time, then the full Activity Execution might look like this:

Start-To-Close Timeout period with retries

If this timeout is reached, the following actions occur:

  • An ActivityTaskTimedOut Event is written to the Workflow Execution's mutable state.
  • If a Retry Policy dictates a retry, the Temporal Cluster schedules another Activity Task.
    • The attempt count increments by 1 in the Workflow Execution's mutable state.
    • The Start-To-Close Timeout timer is reset.

How to implement

Schedule-To-Close Timeout

A Schedule-To-Close Timeout is the maximum amount of time allowed for the overall Activity Execution, from when the first Activity Task is scheduled to when the last Activity Task, in the chain of Activity Tasks that make up the Activity Execution, reaches a Closed status.

Schedule-To-Close Timeout period

Schedule-To-Close Timeout period

Example Schedule-To-Close Timeout period for an Activity Execution that has a chain Activity Task Executions:

Schedule-To-Close Timeout period with a retry

Schedule-To-Close Timeout period with a retry

The default Schedule-To-Close Timeout is ∞ (infinity).

An Activity Execution must have either this timeout (Schedule-To-Close) or Start-To-Close set. By default an Activity Execution Retry Policy dictates that retries will occur for up to 10 years. This timeout can be used to reduce the overall time that has elapsed, without altering the default Retry Policy.

Heartbeat Timeout

A Heartbeat Timeout is the maximum time between Activity Heartbeats.

Heartbeat Timeout periods

Heartbeat Timeout periods

If this timeout is reached, the Activity Execution changes to a Failed status, and will retry if a Retry Policy dictates it.

Retry Policy

A Retry Policy is collection of attributes that instructs the Temporal Server how to retry a failure of a Workflow Execution or an Activity Task Execution.

  • When a Workflow Execution is invoked it is not associated with a default Retry Policy and thus does not retry by default. The intention is that a Workflow Definition should be written to never fail due to intermittent issues; an Activity is designed to handle such issues.
note

Retry Policies do not apply to Workflow Task Executions, which, by default, retry indefinitely.

Default values for Retry Policy

Initial Interval     = 1 second
Backoff Coefficient = 2.0
Maximum Interval = 100 × Initial Interval
Maximum Attempts = ∞
Non-Retryable Errors = []

Initial Interval

  • Description: Amount of time that must elapse before the first retry occurs.
    • The default value is 1 second.
  • Use case: This is used as the base interval time for the Backoff Coefficient to multiply against.

Backoff Coefficient

  • Description: The value dictates how much the retry interval increases.
    • The default value is 2.0.
    • A backoff coefficient of 1.0 means that the retry interval always equals the Initial Interval.
  • Use case: Use this attribute to increase the interval between retries. By having a backoff coefficient greater than 1.0, the first few retries happen relatively quickly to overcome intermittent failures, but subsequent retries happen farther and farther apart to account for longer outages. Use the Maximum Interval attribute to prevent the coefficient from increasing the retry interval too much.

Maximum Interval

  • Description: Specifies the maximum interval between retries.
  • Use case: This attribute is useful for Backoff Coefficients that are greater than 1.0 because it prevents the retry interval from growing infinitely.

Maximum Attempts

  • Description: Specifies the maximum number of execution attempts that can be made in the presence of failures.
    • The default is unlimited.
    • If this limit is exceeded, the execution fails without retrying again. When this happens an error is returned.
    • Setting the value to 0 also means unlimited.
    • Setting the value to 1 means a single execution attempt and no retries.
    • Setting the value to a negative integer results in an error when the execution is invoked.
  • Use case: Use this attribute to ensure that retries do not continue indefinitely. However, in the majority of cases, we recommend relying on the Workflow Execution Timeout, in the case of Workflows, or Schedule-To-Close Timeout, in the case of Activities, to limit the total duration of retries instead of using this attribute.

Non-Retryable Errors

  • Description: Specifies errors that shouldn't be retried.
  • Use case: There may be errors that you know of that should not trigger a retry. In this case you can specify them such that if they occur, the given execution will not be retried.

Get notified of updates