The definition code of a Temporal workflow must be deterministic because Temporal uses event sourcing to reconstruct the workflow state by replaying the saved history event data on the workflow definition code. This means that any incompatible update to the workflow definition code could cause a non-deterministic issue if not handled correctly.
Consider the following workflow definition:
Now let's say we have replaced ActivityA with ActivityC, and deployed the updated code. If there is an existing workflow execution that was started by the original version of the workflow code, where ActivityA had already completed and the result was recorded to history, the new version of the workflow code will pick up that workflow execution and try to resume from there. However, the workflow will fail because the new code expects a result for ActivityC from the history data, but instead it gets the result for ActivityA. This causes the workflow to fail on the non-deterministic error.
Thus we use
workflow.GetVersion() is run for the new workflow execution, it records a marker in the workflow
history so that all future calls to
GetVersion for this change Id--
Step 1 in the example--on this
workflow execution will always return the given version number, which is
1 in the example.
If you make an additional change, such as replacing ActivityC with ActivityD, you need to add some additional code:
Note that we have changed
maxSupported from 1 to 2. A workflow that had already passed this
GetVersion() call before it was introduced will return
DefaultVersion. A workflow that was run
maxSupported set to 1, will return 1. New workflows will return 2.
After you are sure that all of the workflow executions prior to version 1 have completed, you can remove the code for that version. It should now look like the following:
You'll note that
minSupported has changed from
1. If an older version of the
workflow execution history is replayed on this code, it will fail because the minimum expected version
is 1. After you are sure that all of the workflow executions for version 1 have completed, then you
can remove 1 so that your code would look like the following:
Note that we have preserved the call to
GetVersion(). There are two reasons to preserve this call:
- This ensures that if there is a workflow execution still running for an older version, it will fail here and not proceed.
- If you need to make additional changes for
Step1, such as changing ActivityD to ActivityE, you only need to update
maxVersionfrom 2 to 3 and branch from there.
You only need to preserve the first call to
GetVersion() for each
changeID. All subsequent calls to
GetVersion() with the same change Id are safe to remove. If necessary, you can remove the first
GetVersion() call, but you need to ensure the following:
- All executions with an older version are completed.
- You can no longer use
Step1for the changeId. If you need to make changes to that same part in the future, such as change from ActivityD to ActivityE, you would need to use a different changeId like
Step1-fix2, and start minVersion from DefaultVersion again. The code would look like the following:
Upgrading a workflow is straightforward if you don't need to preserve your currently running
workflow executions. You can simply terminate all of the currently running workflow executions and
suspend new ones from being created while you deploy the new version of your workflow code, which does
GetVersion(), and then resume workflow creation. However, that is often not the case, and
you need to take care of the currently running workflow executions, so using
GetVersion() to update
your code is the method to use.
However, if you want your currently running workflows to proceed based on the current workflow logic,
but you want to ensure new workflows are running on new logic, you can define your workflow as a
WorkflowType, and change your start path (calls to
StartWorkflow()) to start the new workflow
The Temporal client SDK performs a sanity check to help prevent obvious incompatible changes. The sanity check verifies whether a decision made in replay matches the event recorded in history, in the same order. The decision is generated by calling any of the following methods:
Adding, removing, or reordering any of the above methods triggers the sanity check and results in a non-deterministic error.
The sanity check does not perform a thorough check. For example, it does not check on the activity's
input arguments or the timer duration. If the check is enforced on every property, then it becomes
too restricted and harder to maintain the workflow code. For example, if you move your activity code
from one package to another package, that changes the
ActivityType, which technically becomes a different
activity. But, we don't want to fail on that change, so we only check the function name part of the