Monitoring and Periodic Execution

Motivation

There is often a need for monitoring and periodic maintenance of IT systems on top of infrastructure provisioning. Polling is executing a regular action to check for a state change, for example:

  • Pinging a host to make sure it's online and responsive.
  • Once-per-minute health checks of a production deployment.
  • Polling an API for a specific resource to become available.
  • Triggering and executing periodic backups.
  • Pushing configuration updates when they become available.
  • Failing over in an active-passive setup when the primary instance becomes unhealthy.

As monitoring is often an example of periodic execution of business logic, it can benefit from Temporal's distributed cron engine.

Benefits of Temporal

Temporal provides guaranteed execution with exactly-once semantics with automatic retries.

Polling configuration can be as straightforward or sophisticated as needed:

  • Workflows can run on a cron schedule with a single configuration setting.
  • Alternatively, you can manually control the delays between intervals with sleep commands. For example, you can switch to more frequent executions in case of detected downtime.

The history service provides visibility into history for periodic Workflow executions.

Scalability is another crucial advantage of using Temporal for periodic execution. Many use cases require periodic execution for a large number of entities. At Uber, some applications run recurring Workflows for each customer. Imagine 100s of millions parallel cron jobs that don't require a separate batch processing framework.

Temporal support for long-running Activities and unlimited retries also makes it a great fit for monitoring use cases.

Example: Cluster Lifecycle Workflow

Imagine a system that manages a large number of compute clusters. It monitors that a cluster is up and running, its CPU and network utilization, run backups and software upgrades.

You can model these operational Activities as one indefinitely long Workflow per cluster which can be alive anywhere from minutes to years, depending on the cluster lifetime. Each Workflow would run periodic Activities, react to user commands via signals, and coordinate multiple potentially conflicting operations.

Next Steps

Learn more about cron jobs in Temporal: