Timeout settings and a Retry Policy provide fine controls over specific steps of Workflow Executions and Activity Executions.
Workflow Execution with a single Activity Execution: Timeout periods
Workflow Execution with a single Activity Execution: Timeout periods
Workflow Execution Timeout
A Workflow Execution Timeout is the maximum time that a Workflow Execution can be executing (have an Open status) including retries and any usage of Continue As New.
Workflow Execution Timeout period
The default value is ∞ (infinite). If this timeout is reached, the Workflow Execution changes to a Timed Out status. This timeout is different from the Workflow Run Timeout. This timeout is most commonly used for stopping the execution of a Temporal Cron Job after a certain amount of time has passed.
Workflow Run Timeout
A Workflow Run Timeout is the maximum amount of time that a single Workflow Run is restricted to.
Workflow Run Timeout period
If the Workflow Run Timeout is reached, the Temporal Server automatically Terminates the Workflow Execution.
Workflow Task Timeout
Workflow Task Timeout period
The default value is 10 seconds. This timeout is primarily available to recognize whether a Worker has gone down so that the Workflow Execution can be recovered on a different Worker. The main reason for increasing the default value would be to accommodate a Workflow Execution that has a very long Workflow Execution History that could take longer than 10 seconds for the Worker to load.
Schedule-To-Start Timeout period
A Retry Policy attached to an Activity Execution retries an Activity Task Execution. Thus the Schedule-To-Start Timeout is applied to each Activity Task Execution within an Activity Execution.
Start-To-Close Timeout period with retries
There are two primary uses case of this timeout:
- Detect whether an individual Worker has crashed.
- Detect whether the fleet of Workers polling the Task Queue is not able to keep up with the rate of Activity Tasks.
The default Schedule-To-Start Timeout is ∞ (infinity).
If this timeout is used, we recommend setting this timeout to the maximum time a Workflow Execution is willing to wait for an Activity Execution in the presence of all possible Worker outages, and have a concrete plan in place to reroute Activity Tasks to a different Task Queue. This timeout does not trigger any retries regardless of the Retry Policy, as a retry would place the Activity Task back into the same Task Queue. As a reminder, we do not recommend using this timeout unless you know what you are doing.
In most cases, we recommend monitoring the
temporal_activity_schedule_to_start_latency metric to know when Workers are not picking up Activity Tasks, instead of setting this timeout.
A Start-To-Close Timeout is the maximum time allowed for a single Activity Task Execution.
The default Start-To-Close Timeout is the same as the default Schedule-To-Close Timeout.
An Activity Execution must have either this timeout (Start-To-Close) or the Schedule-To-Close Timeout set. We recommend always setting this timeout; however, make sure that it is always set to be longer than the maximum possible time for the Activity Execution to take place. For long running Activity Executions, we recommend also using Activity Heartbeats and Heartbeat Timeouts.
The main use case for the Start-To-Close timeout is to detect when a Worker crashes after it has started executing an Activity Task.
A Retry Policy attached to an Activity Execution retries an Activity Task Execution. Thus the Start-To-Close Timeout is applied to each Activity Task Execution within an Activity Execution.
If the first Activity Task Execution returns an error the first time, then the full Activity Execution might look like this:
If this timeout is reached, the following actions occur:
- An ActivityTaskTimedOut Event is written to the Workflow Execution's mutable state.
- If a Retry Policy dictates a retry, the Temporal Cluster schedules another Activity Task.
- The attempt count increments by 1 in the Workflow Execution's mutable state.
- The Start-To-Close Timeout timer is reset.
How to implement
A Schedule-To-Close Timeout is the maximum amount of time allowed for the overall Activity Execution, from when the first Activity Task is scheduled to when the last Activity Task, in the chain of Activity Tasks that make up the Activity Execution, reaches a Closed status.
Schedule-To-Close Timeout period
Example Schedule-To-Close Timeout period for an Activity Execution that has a chain Activity Task Executions:
Schedule-To-Close Timeout period with a retry
The default Schedule-To-Close Timeout is ∞ (infinity).
An Activity Execution must have either this timeout (Schedule-To-Close) or Start-To-Close set. By default an Activity Execution Retry Policy dictates that retries will occur for up to 10 years. This timeout can be used to reduce the overall time that has elapsed, without altering the default Retry Policy.
A Heartbeat Timeout is the maximum time between Activity Heartbeats.
Heartbeat Timeout periods
If this timeout is reached, the Activity Execution changes to a Failed status, and will retry if a Retry Policy dictates it.
The wait time before a retry is the retry interval. A retry interval is the smaller of two values:
- When a Workflow Execution is invoked it is not associated with a default Retry Policy and thus does not retry by default. The intention is that a Workflow Definition should be written to never fail due to intermittent issues; an Activity is designed to handle such issues.
Retry Policies do not apply to Workflow Task Executions, which, by default, retry indefinitely.
A Retry Policy can be provided to a Workflow Execution when it is invoked, but only certain scenarios merit doing this, such as the following:
- A cron Workflow or some other stateless, always-running Workflow Execution that can benefit from retries.
- A file-processing or media-encoding Workflow Execution that downloads files to a host.
When an Activity Execution is spawned, it is associated with a default Retry Policy, and thus Activity Task Executions are retried by default. When an Activity Task Execution is retried, the Server places a new Activity Task into its respective Activity Task Queue, which results in a new Activity Task Execution.
Default values for Retry Policy
Initial Interval = 1 second
Backoff Coefficient = 2.0
Maximum Interval = 100 × Initial Interval
Maximum Attempts = ∞
Non-Retryable Errors = 
- Description: Amount of time that must elapse before the first retry occurs.
- The default value is 1 second.
- Use case: This is used as the base interval time for the Backoff Coefficient to multiply against.
- Description: The value dictates how much the retry interval increases.
- The default value is 2.0.
- A backoff coefficient of 1.0 means that the retry interval always equals the Initial Interval.
- Use case: Use this attribute to increase the interval between retries. By having a backoff coefficient greater than 1.0, the first few retries happen relatively quickly to overcome intermittent failures, but subsequent retries happen farther and farther apart to account for longer outages. Use the Maximum Interval attribute to prevent the coefficient from increasing the retry interval too much.
- Description: Specifies the maximum interval between retries.
- The default value is 100 times the Initial Interval.
- Use case: This attribute is useful for Backoff Coefficients that are greater than 1.0 because it prevents the retry interval from growing infinitely.
- Description: Specifies the maximum number of execution attempts that can be made in the presence of failures.
- The default is unlimited.
- If this limit is exceeded, the execution fails without retrying again. When this happens an error is returned.
- Setting the value to 0 also means unlimited.
- Setting the value to 1 means a single execution attempt and no retries.
- Setting the value to a negative integer results in an error when the execution is invoked.
- Use case: Use this attribute to ensure that retries do not continue indefinitely. However, in the majority of cases, we recommend relying on the Workflow Execution Timeout, in the case of Workflows, or Schedule-To-Close Timeout, in the case of Activities, to limit the total duration of retries instead of using this attribute.
- Description: Specifies errors that shouldn't be retried.
- Use case: There may be errors that you know of that should not trigger a retry. In this case you can specify them such that if they occur, the given execution will not be retried.