Fault-oblivious stateful Workflow code is the core abstraction of Temporal. But, due to deterministic execution requirements, they are not allowed to call any external API directly. Instead they orchestrate execution of Activities. In its simplest form, a Temporal Activity is a function or an object method in one of the supported languages. Temporal does not recover Activity state in case of failures. Therefore an Activity function is allowed to contain any code without restrictions.
Activities are invoked asynchronously through task queues. A task queue is essentially a queue used to store an Activity task until it is picked up by an available worker. The worker processes an Activity by invoking its implementation function. When the function returns, the worker reports the result back to the Temporal service which in turn notifies the Workflow about completion. It is possible to implement an Activity fully asynchronously by completing it from a different process.
Schedule To Start period
There are two primary uses case of this timeout:
- Detect whether an individual Worker has crashed.
- Detect whether the fleet of Workers polling the Task Queue is not able to keep up with the rate of Activity Tasks.
If this timeout is used, we recommend setting this timeout to the maximum time a Workflow Execution is willing to wait for an Activity Execution in the presence of all possible Worker outages, and have a concrete plan in place to reroute Activity Tasks to a different Task Queue. This timeout does not trigger any retries regardless of the Retry Policy, as a retry would place the Activity Task back into the same Task Queue. As a reminder, we do not recommend using this timeout unless you know what you are doing.
In most cases, we recommend monitoring the
temporal_activity_schedule_to_start_latency metric to know when Workers are not picking up Activity Tasks, instead of setting this timeout.
A Start-To-Close Timeout is the maximum time allowed for a single Activity Task Execution.
Start To Close period
Start-To-Close period with retries
An Activity Execution must have either this timeout (Start-To-Close) or the Schedule-To-Close Timeout set. We recommend always setting this timeout, however make sure that it is always set to be longer than the possible maximum time for the Activity Execution to take place. For long running Activity Executions, we recommend also using Activity Heartbeats and Heartbeat Timeouts.
The main use case for the Start-To-Close timeout is to detect when a Worker crashes after it has started executing an Activity Task.
If this timeout is reached the following takes place:
- An ActivityTaskTimedOut event is written to the Workflow Execution's mutable state.
- If there is a Retry Policy that dictates a retry, then the Temporal Server schedules another Activity Task.
- The attempt count increments by 1 in the Workflow Execution's mutable state.
- The Start-To-Close Timeout timer is reset.
A Schedule-To-Close Timeout is the maximum amount of time allowed for the overall Activity Execution, from when the first Activity Task is scheduled to when the last Activity Task, in the chain of Activity Tasks that make up the Activity Execution, reaches a Closed status.
Schedule-To-Close period with retries
An Activity Execution must have either this timeout (Schedule-To-Close) or Start-To-Close set. By default an Activity Execution Retry Policy dictates that retries will occur for up to 10 years. This timeout can be used to reduce the overall time that has elapsed, without altering the default Retry Policy.
A Heartbeat Timeout is the maximum time between Activity Heartbeats.
Heartbeat Timeout periods
If this timeout is reached, the Activity Execution changes to a Failed status, and will retry if a Retry Policy dictates it.
The wait time before a retry is the retry interval. A retry interval is the smaller of two values:
- When a Workflow Execution is invoked it is not associated with a default Retry Policy and thus does not retry by default. The intention is that a Workflow Definition should be written to never fail due to intermittent issues; an Activity is designed to handle such issues.
Retry Policies do not apply to Workflow Task Executions, which, by default, retry indefinitely.
A Retry Policy can be provided to a Workflow Execution when it is invoked, but only certain scenarios merit doing this, such as the following:
- A cron Workflow or some other stateless, always-running Workflow Execution that can benefit from retries.
- A file-processing or media-encoding Workflow Execution that downloads files to a host.
When an Activity Execution is invoked, it is associated with a default Retry Policy, and thus Activity Task Executions are retried by default. When an Activity Task Execution is retried, the Server places a new Activity Task into its respective Activity Task Queue, which results in a new Activity Task Execution.
Default values for Retry Policy
Initial Interval = 1 secondBackoff Coefficient = 2.0Maximum Interval = 100 × Initial IntervalMaximum Attempts = ∞Non-Retryable Errors = 
- Description: Amount of time that must elapse before the first retry occurs.
- The default value is 1 second.
- Use case: This is used as the base interval time for the Backoff Coefficient to multiply against.
- Description: The value dictates how much the retry interval increases.
- The default value is 2.0.
- A backoff coefficient of 1.0 means that the retry interval always equals the Initial Interval.
- Use case: Use this attribute to increase the interval between retries. By having a backoff coefficient greater than 1.0, the first few retries happen relatively quickly to overcome intermittent failures, but subsequent retries happen farther and farther apart to account for longer outages. Use the Maximum Interval attribute to prevent the coefficient from increasing the retry interval too much.
- Description: Specifies the maximum interval between retries.
- The default value is 100 times the Initial Interval.
- Use case: This attribute is useful for Backoff Coefficients that are greater than 1.0 because it prevents the retry interval from growing infinitely.
- Description: Specifies the maximum number of execution attempts that can be made in the presence of failures.
- The default is unlimited.
- If this limit is exceeded, the execution fails without retrying again. When this happens an error is returned.
- Setting the value to 0 also means unlimited.
- Setting the value to 1 means a single execution attempt and no retries.
- Setting the value to a negative integer results in an error when the execution is invoked.
- Use case: Use this attribute to ensure that retries do not continue indefinitely. However, in the majority of cases, we recommend relying on the Workflow Execution Timeout, in the case of Workflows, or Schedule-To-Close Timeout, in the case of Activities, to limit the total duration of retries instead of using this attribute.
- Description: Specifies errors that shouldn't be retried.
- Use case: There may be errors that you know of that should not trigger a retry. In this case you can specify them such that if they occur, the given execution will not be retried.
For long running Activities, we recommended that you specify a relatively short heartbeat timeout and constantly heartbeat. This way worker failures for even very long running Activities can be handled in a timely manner. An Activity that specifies the heartbeat timeout is expected to call the heartbeat method periodically from its implementation.
A heartbeat request can include application specific payload. This is useful to save Activity execution progress. If an Activity times out due to a missed heartbeat, the next attempt to execute it can access that progress and continue its execution from that point.
Long running Activities can be used as a special case of leader election. Temporal timeouts use second resolution. So it is not a solution for realtime applications. But if it is okay to react to the process failure within a few seconds, then a Temporal heartbeat Activity is a good fit.
One common use case for such leader election is monitoring. An Activity executes an internal loop that periodically polls some API and checks for some condition. It also heartbeats on every iteration. If the condition is satisfied, the Activity completes which lets its Workflow to handle it. If the Activity worker dies, the Activity times out after the heartbeat interval is exceeded and is retried on a different worker. The same pattern works for polling for new files in Amazon S3 buckets or responses in REST or other synchronous APIs.
A Workflow can request to cancel an Activity.
When an Activity is cancelled, or its Workflow execution has completed or failed, the context passed into its function is cancelled, which also sets its channel’s closed state to
An Activity can use that to perform any necessary cleanup and abort its execution.
Cancellation is only delivered to Activities that record heartbeats:
- The heartbeat request fails with a special error indicating that the Activity was cancelled. Heartbeats can also fail when the Workflow that invoked it is in a completed state.
- The Activity should perform all necessary cleanup and report when it is done.
- The Workflow can decide if it wants to wait for the Activity cancellation confirmation or proceed without waiting.
Cancellations are not immediate
ctx.Done() is only signaled when a heartbeat is sent to the service.
Temporal's SDK throttles this so a heartbeat may not be sent to the service until 80% of the heartbeat timeout has elapsed.
For example, if your heartbeat timeout is 20 seconds,
ctx.Done() will not be signaled until 80% of 20 seconds (~16 seconds) has elapsed.
To increase or decrease the delay of cancelation, modify the heartbeat timeout defined for the activity context.
Activities are dispatched to workers through task queues. Task queues are queues that workers listen on. Task queues are highly dynamic and lightweight. They don't need to be explicitly registered. And it is okay to have one task queue per worker process. It is normal to have more than one Activity type to be invoked through a single task queue. And it is normal in some cases (like host routing) to invoke the same Activity type on multiple task queues.
Here are some use cases for employing multiple Activity task queues in a single Workflow:
- Flow control. A worker that consumes from a task queue asks for an Activity task only when it has available capacity. So workers are never overloaded by request spikes. If Activity executions are requested faster than workers can process them, they are backlogged in the task queue.
- Throttling. Each Activity worker can specify the maximum rate it is allowed to process Activities on a task queue. It does not exceed this limit even if it has spare capacity. There is also support for global task queue rate limiting. This limit works across all workers for the given task queue. It is frequently used to limit load on a downstream service that an Activity calls into.
- Deploying a set of Activities independently. Think about a service that hosts Activities and can be deployed independently from other Activities and Workflows. To send Activity tasks to this service, a separate task queue is needed.
- Workers with different capabilities. For example, workers on GPU boxes vs non GPU boxes. Having two separate task queues in this case allows Workflows to pick which one to send Activity an execution request to.
- Routing Activity to a specific host. For example, in the media encoding case the transform and upload Activity have to run on the same host as the download one.
- Routing Activity to a specific process. For example, some Activities load large data sets and caches it in the process. The Activities that rely on this data set should be routed to the same process.
- Multiple priorities. One task queue per priority and having a worker pool per priority.
- Versioning. A new backwards incompatible implementation of an Activity might use a different task queue.
Asynchronous Activity Completion occurs when the final result of a computation, started by an Activity, is provided to the Temporal System from an external system.
By default, an Activity is a function or method (depending on the language) that completes as soon as the function or method returns. But in some cases an Activity implementation is asynchronous. For example, the action could be forwarded to an external system through a message queue, and the result could come through a different queue.
To support such use cases, Temporal allows Activity implementations that do not complete upon Activity function completions. A separate API should be used in this case to complete the Activity. This API can be called from any process, even in a different programming language, that the original Activity worker used.
Although a Local Activity consumes less resources than a regular Activity Execution, it is subject to shorter durations and a lack of rate limiting.
Some Activities are very short lived and do not need the queuing semantic, flow control, rate limiting and routing capabilities. For this case, Temporal supports a local Activity feature. Local Activities are executed in the same worker process as the Workflow that invoked them. Consider using local Activities for functions that are:
- no longer than a few seconds
- do not require global rate limiting
- do not require routing to specific workers or pools of workers
- can be implemented in the same binary as the Workflow that invokes them
The main benefit of local Activities is that they are much more efficient in utilizing Temporal service resources and have much lower latency overhead compared to the usual Activity invocation.