Skip to main content

What is a Temporal Activity?

This guide provides a comprehensive overview of Temporal Activities.

In day-to-day conversation, the term Activity denotes an Activity Definition, Activity Type, or Activity Execution. Temporal documentation aims to be explicit and differentiate between them.

An Activity is a normal function or method that executes a single, well-defined action (either short or long running), such as calling another service, transcoding a media file, or sending an email message. Activity code can be non-deterministic. We recommend that it be idempotent.

Workflow code orchestrates the execution of Activities, persisting the results. If an Activity Function Execution fails, any future execution starts from initial state (except Heartbeats).

Activity Functions are executed by Worker Processes. When the Activity Function returns, the Worker sends the results back to the Temporal Cluster as part of the ActivityTaskCompleted Event. The Event is added to the Workflow Execution's Event History. For other Activity-related Events, see Activity Events.

What is an Activity Definition?

An Activity Definition is the code that defines the constraints of an Activity Task Execution.

The term 'Activity Definition' is used to refer to the full set of primitives in any given language SDK that provides an access point to an Activity Function Definition——the method or function that is invoked for an Activity Task Execution. Therefore, the terms Activity Function and Activity Method refer to the source of an instance of an execution.

Activity Definitions are named and referenced in code by their Activity Type.

Activity Definition

Activity Definition

Idempotency

Temporal recommends that Activities be idempotent.

Idempotent means that performing an operation multiple times has the same result as performing it once. In the context of Temporal, Activities should be designed to be safely executed multiple times without causing unexpected or undesired side effects.

info

By design, completed Activities will not re-execute as part of a Workflow Replay. However, Activities won’t record to the Event History until they return or produce an error. If an Activity fails to report to the server at all, it will be retried. Designing for idempotence, especially if you have a Global Namespace, will improve reusability and reliability.

An Activity is idempotent if multiple Activity Task Executions do not change the state of the system beyond the first Activity Task Execution.

We recommend using idempotency keys for critical side effects.

The lack of idempotency might affect the correctness of your application but does not affect the Temporal Platform. In other words, lack of idempotency doesn't lead to a platform error.

In some cases, whether something is idempotent doesn't affect the correctness of an application. For example, if you have a monotonically incrementing counter, you might not care that retries increment the counter because you don’t care about the actual value, only that the current value is greater than a previous value.

Constraints

Activity Definitions are executed as normal functions.

In the event of failure, the function begins at its initial state when retried (except when Activity Heartbeats are established).

Therefore, an Activity Definition has no restrictions on the code it contains.

Parameters

An Activity Definition can support as many parameters as needed.

All values passed through these parameters are recorded in the Event History of the Workflow Execution. Return values are also captured in the Event History for the calling Workflow Execution.

Activity Definitions must contain the following parameters:

  • Context: an optional parameter that provides Activity context within multiple APIs.
  • Heartbeat: a notification from the Worker to the Temporal Cluster that the Activity Execution is progressing. Cancelations are allowed only if the Activity Definition permits Heartbeating.
  • Timeouts: intervals that control the execution and retrying of Activity Task Executions.

Other parameters, such as Retry Policies and return values, can be seen in the implementation guides, listed in the next section.

What is an Activity Type?

An Activity Type is the mapping of a name to an Activity Definition.

Activity Types are scoped through Task Queues.

What is an Activity Execution?

An Activity Execution is the full chain of Activity Task Executions.

Activity Execution

Activity Execution

By default, an Activity Execution has no time limit. You can customize Activity Execution timeouts and retry policies.

If an Activity Execution fails (because it exhausted all retries, threw a non-retryable error, or was canceled), the error is returned to the Workflow, which decides how to handle it.

Cancellation

Activity Cancellation:

  • lets the Activity know it doesn't need to keep doing work, and
  • gives the Activity time to clean up any resources it has created.

Activities can only receive Cancellation if they emit Heartbeats or in Core-based SDKs (TypeScript/Python) are Local Activities (which don't heartbeat but receive Cancellation anyway).

An Activity may receive Cancellation if:

  • The Activity was requested to be Cancelled. This can often cascade from Workflow Cancellation, but not always—SDKs have ways to stop Cancellation from cascading.
  • The Activity was considered failed by the Server because any of the Activity timeouts have triggered (for example, the Server didn't receive a heartbeat within the Activity's Heartbeat timeout). The Cancelled Failure that the Activity receives will have message: 'TIMED_OUT'.
  • The Workflow Run reached a Closed state, in which case the Cancelled Failure will have message: 'NOT_FOUND'.
  • In some SDKs:
    • The Worker is shutting down.
    • An Activity sends a Heartbeat but the Heartbeat details can't be converted by the Worker's configured Data Converter. This fails the Activity Task Execution with an Application Failure.

There are different ways to receive Cancellation depending on the SDK. An Activity may accept or ignore Cancellation:

  • To allow Cancellation to happen, let the Cancellation Failure propagate.
  • To ignore Cancellation, catch it and continue executing.

Some SDKs have ways to shield tasks from being stopped while still letting the Cancellation propagate.

The Workflow can also decide if it wants to wait for the Activity Cancellation to be accepted or to proceed without waiting.

Cancellation can only be requested a single time. If you try to cancel your Activity Execution more than once, it will not receive more than one Cancellation request.

What is an Activity Id?

The identifier for an Activity Execution. The identifier can be generated by the system, or it can be provided by the Workflow code that spawns the Activity Execution. The identifier is unique among the open Activity Executions of a Workflow Run. (A single Workflow Run may reuse an Activity Id if an earlier Activity Execution with the same Id has closed.)

An Activity Id can be used to complete the Activity asynchronously.

What is a Schedule-To-Start Timeout?

A Schedule-To-Start Timeout is the maximum amount of time that is allowed from when an Activity Task is scheduled (that is, placed in a Task Queue) to when a Worker starts (that is, picks up from the Task Queue) that Activity Task. In other words, it's a limit for how long an Activity Task can be enqueued.

The moment that the Task is picked by the Worker from the Task Queue is considered to be the start of the Activity Task for the purposes of the Schedule-To-Start Timeout and associated metrics. This definition of "Start" avoids issues that a clock difference between the Temporal Cluster and a Worker might create.

Schedule-To-Start Timeout period

Schedule-To-Start Timeout period

"Schedule" in Schedule-To-Start and Schedule-To-Close have different frequency guarantees.

The Schedule-To-Start Timeout is enforced for each Activity Task, whereas the Schedule-To-Close Timeout is enforced once per Activity Execution. Thus, "Schedule" in Schedule-To-Start refers to the scheduling moment of every Activity Task in the sequence of Activity Tasks that make up the Activity Execution, while "Schedule" in Schedule-To-Close refers to the first Activity Task in that sequence.

A Retry Policy attached to an Activity Execution retries an Activity Task.

Start-To-Close Timeout period with retries

Start-To-Close Timeout period with retries

This timeout has two primary use cases:

  1. Detect whether an individual Worker has crashed.
  2. Detect whether the fleet of Workers polling the Task Queue is not able to keep up with the rate of Activity Tasks.

The default Schedule-To-Start Timeout is ∞ (infinity).

If this timeout is used, we recommend setting this timeout to the maximum time a Workflow Execution is willing to wait for an Activity Execution in the presence of all possible Worker outages, and have a concrete plan in place to reroute Activity Tasks to a different Task Queue. This timeout does not trigger any retries regardless of the Retry Policy, as a retry would place the Activity Task back into the same Task Queue. We do not recommend using this timeout unless you know what you are doing.

In most cases, we recommend monitoring the temporal_activity_schedule_to_start_latency metric to know when Workers slow down picking up Activity Tasks, instead of setting this timeout.

What is a Start-To-Close Timeout?

A Start-To-Close Timeout is the maximum time allowed for a single Activity Task Execution.

The default Start-To-Close Timeout is the same as the default Schedule-To-Close Timeout.

An Activity Execution must have either this timeout (Start-To-Close) or the Schedule-To-Close Timeout set. We recommend always setting this timeout; however, make sure that Start-To-Close Timeout is always set to be longer than the maximum possible time for the Activity Execution to complete. For long running Activity Executions, we recommend also using Activity Heartbeats and Heartbeat Timeouts.

tip

We strongly recommend setting a Start-To-Close Timeout.

The Temporal Server doesn't detect failures when a Worker loses communication with the Server or crashes. Therefore, the Temporal Server relies on the Start-To-Close Timeout to force Activity retries.

The main use case for the Start-To-Close timeout is to detect when a Worker crashes after it has started executing an Activity Task.

Start-To-Close Timeout period

Start-To-Close Timeout period

A Retry Policy attached to an Activity Execution retries an Activity Task Execution. Thus, the Start-To-Close Timeout is applied to each Activity Task Execution within an Activity Execution.

If the first Activity Task Execution returns an error the first time, then the full Activity Execution might look like this:

Start-To-Close Timeout period with retries

Start-To-Close Timeout period with retries

If this timeout is reached, the following actions occur:

  • An ActivityTaskTimedOut Event is written to the Workflow Execution's mutable state.
  • If a Retry Policy dictates a retry, the Temporal Cluster schedules another Activity Task.
    • The attempt count increments by 1 in the Workflow Execution's mutable state.
    • The Start-To-Close Timeout timer is reset.

What is a Schedule-To-Close Timeout?

A Schedule-To-Close Timeout is the maximum amount of time allowed for the overall Activity Execution, from when the first Activity Task is scheduled to when the last Activity Task, in the chain of Activity Tasks that make up the Activity Execution, reaches a Closed status.

Schedule-To-Close Timeout period

Schedule-To-Close Timeout period

Example Schedule-To-Close Timeout period for an Activity Execution that has a chain Activity Task Executions:

Schedule-To-Close Timeout period with a retry

Schedule-To-Close Timeout period with a retry

The default Schedule-To-Close Timeout is ∞ (infinity).

An Activity Execution must have either this timeout (Schedule-To-Close) or Start-To-Close set. This timeout can be used to control the overall duration of an Activity Execution in the face of failures (repeated Activity Task Executions), without altering the Maximum Attempts field of the Retry Policy.

tip

We strongly recommend setting a Start-To-Close Timeout.

The Temporal Server doesn't detect failures when a Worker loses communication with the Server or crashes. Therefore, the Temporal Server relies on the Start-To-Close Timeout to force Activity retries.

What is an Activity Heartbeat?

An Activity Heartbeat is a ping from the Worker that is executing the Activity to the Temporal Cluster. Each ping informs the Temporal Cluster that the Activity Execution is making progress and the Worker has not crashed.

Activity Heartbeats work in conjunction with a Heartbeat Timeout.

Activity Heartbeats are implemented within the Activity Definition. Custom progress information can be included in the Heartbeat which can then be used by the Activity Execution should a retry occur.

An Activity Heartbeat can be recorded as often as needed (e.g. once a minute or every loop iteration). It is often a good practice to Heartbeat on anything but the shortest Activity Function Execution. Temporal SDKs control the rate at which Heartbeats are sent to the Cluster.

Heartbeating is not required from Local Activities, and does nothing.

For long-running Activities, we recommend using a relatively short Heartbeat Timeout and a frequent Heartbeat. That way if a Worker fails it can be handled in a timely manner.

A Heartbeat can include an application layer payload that can be used to save Activity Execution progress. If an Activity Task Execution times out due to a missed Heartbeat, the next Activity Task can access and continue with that payload.

Activity Cancellations are delivered to Activities from the Cluster when they Heartbeat. Activities that don't Heartbeat can't receive a Cancellation. Heartbeat throttling may lead to Cancellation getting delivered later than expected.

Throttling

Heartbeats may not always be sent to the Cluster—they may be throttled by the Worker. The throttle interval is the smaller of the following:

  • If heartbeatTimeout is provided, heartbeatTimeout * 0.8; otherwise, defaultHeartbeatThrottleInterval
  • maxHeartbeatThrottleInterval

defaultHeartbeatThrottleInterval is 30 seconds by default, and maxHeartbeatThrottleInterval is 60 seconds by default. Each can be set in Worker options.

Throttling is implemented as follows:

  • After sending a Heartbeat, the Worker sets a timer for the throttle interval.
  • The Worker stops sending Heartbeats, but continues receiving Heartbeats from the Activity and remembers the most recent one.
  • When the timer fires, the Worker:
    • Sends the most recent Heartbeat.
    • Sets the timer again.

Which Activities should Heartbeat?

Heartbeating is best thought about not in terms of time, but in terms of "How do you know you are making progress?" For short-term operations, progress updates are not a requirement. However, checking the progress and status of Activity Executions that run over long periods is almost always useful.

Consider the following when setting Activity Hearbeats:

  • Your underlying task must be able to report definite progress. Note that your Workflow cannot read this progress information while the Activity is still executing (or it would have to store it in Event History). You can report progress to external sources if you need it exposed to the user.

  • Your Activity Execution is long-running, and you need to verify whether the Worker that is processing your Activity is still alive and has not run out of memory or silently crashed.

For example, the following scenarios are suitable for Heartbeating:

  • Reading a large file from Amazon S3.
  • Running a ML training job on some local GPUs.

And the following scenarios are not suitable for Heartbeating:

  • Making a quick API call.
  • Reading a small file from disk.

What is a Heartbeat Timeout?

A Heartbeat Timeout is the maximum time between Activity Heartbeats.

Heartbeat Timeout periods

Heartbeat Timeout periods

If this timeout is reached, the Activity Task fails and a retry occurs if a Retry Policy dictates it.

What is Asynchronous Activity Completion?

Asynchronous Activity Completion is a feature that enables an Activity Function to return without causing the Activity Execution to complete. The Temporal Client can then be used from anywhere to both Heartbeat Activity Execution progress and eventually provide complete the Activity Execution and provide a result.

How to complete an Activity Asynchronously in:

When to use Async Completion

When an external system has the final result of a computation that is started by an Activity, there are three main ways of getting the result to the Workflow:

  1. The external system uses Async Completion to complete the Activity with the result.
  2. The Activity completes normally, without the result. Later, the external system sends a Signal to the Workflow with the result.
  3. A subsequent Activity polls the external system for the result.

If you don't have control over the external system—that is, you can't add Async Completion or a Signal to its code—then

  • you can poll (#3), or
  • if the external system can reliably call a webhook (and retry calling in the case of failure), you can write a webhook handler that sends a Signal to the Workflow (#2).

The decision between using #1 vs #2 involves a few factors. Use Async Completion if

  • the external system is unreliable and might fail to Signal, or
  • you want the external process to Heartbeat or receive Cancellation.

Otherwise, if the external system can reliably be trusted to do the task and Signal back with the result, and it doesn't need to Heartbeat or receive Cancellation, then you may want to use Signals.

The benefit to using Signals has to do with the timing of failure retries. For example, consider an external process that is waiting for a human to review something and respond, and they could take up to a week to do so. If you use Async Completion (#1), you would

  • set a Start-To-Close Timeout of one week on the Activity,
  • in the Activity, notify the external process you need the human review, and
  • have the external process Asynchronously Complete the Activity when the human responds.

If the Activity fails on the second step to notify the external system and doesn't throw an error (for example, if the Worker dies), then the Activity won't be retried for a week, when the Start-To-Close Timeout is hit.

If you use Signals, you would:

  • set a Start-To-Close Timeout of one minute on the Activity,
  • in the Activity, notify the external process you need the human review,
  • complete the Activity without the result, and
  • have the external process Signal the Workflow when the human responds.

If the Activity fails on the second step to notify the external system and doesn't throw an error, then the Activity will be retried in a minute.

In the second scenario, the failure is retried sooner. This is particularly helpful in scenarios like this in which the external process might take a long time.

What is a Task Token?

A Task Token is a unique identifier for an Activity Task Execution.

Asynchronous Activity Completion calls take either of the following as arguments:

What is a Local Activity?

A Local Activity is an Activity Execution that executes in the same process as the Workflow Execution that spawns it.

Some Activity Executions are very short-living and do not need the queuing semantic, flow control, rate limiting, and routing capabilities. For this case, Temporal supports the Local Activity feature.

The main benefit of Local Activities is that they use less Temporal Cluster resources (for example, fewer History events) and have much lower latency overhead (because no need to roundtrip to the Cluster) compared to normal Activity Executions. However, Local Activities are subject to shorter durations and a lack of rate limiting.

Consider using Local Activities for functions that are the following:

  • can be implemented in the same binary as the Workflow that calls them.
  • do not require global rate limiting.
  • do not require routing to a specific Worker or Worker pool.
  • no longer than a few seconds, inclusive of retries.

If it takes longer than 80% of the Workflow Task Timeout (which is 10 seconds by default), the Worker will ask the Cluster to create a new Workflow Task to extend the "lease" for processing the Local Activity. The Worker will continue doing so until the Local Activity has completed. This is called Workflow Task Heartbeating. The drawbacks of long-running Local Activities are:

  • Each new Workflow Task results in 3 more Events in History.
  • The Workflow won't get notified of new events like Signals and completions until the next Workflow Task Heartbeat.
  • New Commands created by the Workflow concurrently with the Local Activity will not be sent to the Cluster until either the Local Activity completes or the next Workflow Task Heartbeat.

Using a Local Activity without understanding its limitations can cause various production issues. We recommend using regular Activities unless your use case requires very high throughput and large Activity fan outs of very short-lived Activities. More guidance in choosing between Local Activity vs Activity is available in our forums.