OpenMetrics API Reference
The Temporal Cloud OpenMetrics API provides actionable operational metrics about your Temporal Cloud deployment. This is a scrapable HTTP API that returns metrics in OpenMetrics format, suitable for ingestion by Prometheus-compatible monitoring systems.
Temporal Cloud OpenMetrics support is available in Public Preview.
Available Metrics Reference
Metrics descriptions are also available programmatically via the /v1/descriptors
endpoint. You can see the Metrics Reference for a list of available metrics.
Authentication
Temporal uses API keys for integrating with the OpenMetrics endpoint. Applications must be authorized and authenticated before they can access metrics from Temporal Cloud.
An API key is owned by a Service Account and inherits the permissions granted to the owner.
Creating API Keys
API keys can be created using the Temporal Cloud UI:
- Navigate to Settings → Service Accounts
- Create a service account with "Metrics Read-Only" Account Level Role
- Generate an API key within the service account
See the docs for more details on generating API keys.
Using API Keys
All API requests must be made over HTTPS. Calls made over plain HTTP will fail. API requests without authentication will also fail.
curl -H "Authorization: Bearer <API_KEY>" https://metrics.temporal.io/v1/metrics
Object Model
The object model for the Metrics API follows the OpenMetrics standard.
Metrics
A metric is a numeric attribute measured at a specific point in time, labeled with contextual metadata gathered at the point of instrumentation.
Metric Types
All Temporal Cloud metrics are exposed as gauges in OpenMetrics format, but represent different measurement types:
- Rate metrics: Pre-computed per-second rates with delta temporality (e.g.,
temporal_cloud_v1_workflow_success_count
- workflows completed per second) - Value metrics: Current or instantaneous values (e.g.,
temporal_cloud_v1_approximate_backlog_count
- current number of tasks in queue)
The list of metrics and their labels are available via the List Descriptors endpoint or in the Metrics Reference.
Labels
A label is a key-value attribute associated with a metric data point. Labels can be used to filter or aggregate metrics.
Common labels include:
temporal_namespace
: The Temporal namespacetemporal_account
: The Temporal accountregion
: The cloud region where the metric originatedtemporal_workflow_type
: The workflow type (where applicable)temporal_task_queue
: The task queue name (where applicable)
Each metric has its own set of applicable labels. See the Metrics Reference for complete details.
Metric Family
A Metric Family may have zero or more metrics. The set of metrics returned will vary based on actual system activity. Metrics only appear in a Metric Family if they were reported during the aggregation window.
Client Considerations
Rate Limiting
To protect the stability of the API and keep it available to all users, Temporal employs multiple safeguards.
When a rate limit is breached, an HTTP 429 Too Many Requests
error is returned with the following headers:
Header | Description |
---|---|
Retry-After | The time in seconds until the rate limit window resets |
Rate Limit Scopes
Rate limit scopes are subject to change.
Scope | Limit |
---|---|
Account | 180 requests per hour |
Response Completeness
The X-Completeness
header indicates whether the response contains all available data:
complete
: The response contains all metrics requestedlimited
: Response truncated due to size limits (30k metric data points max). Use namespace or metric filtering to reduce the response size.unknown
: Completeness cannot be determined (possibly due to regional issues or timeouts). Clients are encouraged to retry.
Retry Logic
Implement retry logic in your client to gracefully handle transient API failures. Use exponential backoff with jitter to avoid retry storms with reasonable retry intervals to avoid reaching rate limits.
Data Latency
Metric data points are available for query within 2 minutes of their origination. This is in line with the freshest metrics available from any major service provider. This latency should be accounted for when setting up monitoring alerts.
Endpoints
All endpoints are served from: metrics.temporal.io
Get Metrics
GET /v1/metrics
Returns metrics in OpenMetrics format suitable for scraping by Prometheus-compatible systems.
Timestamp Offset
To account for metric data latency, this endpoint returns metrics from the current timestamp minus a fixed offset. The current offset is 2 minutes rounded down to the start of the minute. To accommodate this offset, the timestamps in the response should be honored when importing the metrics. For example, in Prometheus this can be controlled using the honor\_timestamps
flag.
Query Parameters
Parameter | Type | Description |
---|---|---|
namespaces | string array | Filter to specific Namespaces. Supports wildcards (e.g., production-* ) |
metrics | string array | Filter to specific metrics |
Response Headers
Header | Description |
---|---|
X-Completeness | Indicates the response status: complete , limited , or unknown |
Content-Type | application/openmetrics-text |
Request:
curl -H "Authorization: Bearer <API_KEY>" \
"https://metrics.temporal.io/v1/metrics?namespaces=production-*"
Response:
# TYPE temporal_cloud_v1_workflow_success_count gauge
# HELP temporal_cloud_v1_workflow_success_count The number of successful workflows per second
temporal_cloud_v1_workflow_success_count{temporal_namespace="production",temporal_workflow_type="payment-processing",region="aws-us-west-2"} 42.0 1609459200000
temporal_cloud_v1_workflow_success_count{temporal_namespace="production",temporal_workflow_type="order-fulfillment",region="aws-us-west-2"} 128.0 1609459200000
# TYPE temporal_cloud_v1_approximate_backlog_count gauge
# HELP temporal_cloud_v1_approximate_backlog_count Approximate number of tasks in a task queue
temporal_cloud_v1_approximate_backlog_count{temporal_namespace="production",temporal_task_queue="critical-queue",task_type="workflow", region="aws-us-west-2"} 15.0 1609459200000
Summary of Best Practices
- Honor timestamps: Set
honor_timestamps: true
in Prometheus - Scrape interval: Use 30 or 60 second intervals
- Timeout: Set scrape timeout to 10 seconds for large responses
- Filtering: Use query parameters to reduce response size
List Metric Descriptors
GET /v1/descriptors
Lists all metric descriptors including metadata, data types, and available dimensions (a.k.a. labels).
Query Parameters
Parameter | Type | Description |
---|---|---|
limit | integer | Page size (1-100, default: 100) |
offset | integer | Page offset |
Request:
curl -H "Authorization: Bearer <API_KEY>" \
"https://metrics.temporal.io/v1/descriptors"
Response:
{
"meta": {
"pagination": {
"total": 35,
"limit": 100,
"offset": 0
}
},
"descriptors": [
{
"name": "temporal_cloud_v1_workflow_success_count",
"help": "The number of successful workflows per second",
"dimensions": [
"temporal_namespace",
"temporal_workflow_type",
"temporal_task_queue",
"region"
]
}
]
}
Integrations
Prometheus + Grafana
- Add a new scrape job for the OpenMetrics endpoint with your API key.
scrape_configs:
- job_name: 'temporal-cloud'
scrape_interval: 60s
scrape_timeout: 30s
honor_timestamps: true
scheme: https
authorization:
type: Bearer
credentials: '<API_KEY>'
static_configs:
- targets: ['metrics.temporal.io']
metrics_path: '/v1/metrics'
- Import the Grafana dashboard and configure your Prometheus datasource.
OpenTelemetry Collector Configuration
- Add a new prometheus receiver for the OpenMetrics endpoint with your API key.
receivers:
prometheus:
config:
scrape_configs:
- job_name: 'temporal-cloud'
scrape_interval: 60s
scrape_timeout: 30s
honor_timestamps: true
scheme: https
authorization:
type: Bearer
credentials_file: <API_KEY_FILE>
static_configs:
- targets: ['metrics.temporal.io']
metrics_path: '/v1/metrics'
processors:
batch:
exporters:
otlphttp:
endpoint: <ENDPOINT>
service:
pipelines:
metrics:
receivers: [prometheus]
processors: [batch]
exporters: [otlphttp]
Examples for these integrations and more are here.
Managing High Cardinality
High-cardinality labels like temporal_task_queue
and temporal_workflow_type
can significantly increase metric volume and impact performance of your monitoring system.
Cardinality Estimation
To estimate your metric cardinality and see if this is an issue:
Total series = Base metrics × Namespaces × Task queues × Workflow types
Example:
- 6 workflow metrics with both labels
- 10 namespaces
- 50 task queues
- 20 workflow types
- = 6 × 10 × 50 × 20 = 60,000 time series
60,000 time series in the above example results in exceeding the 30,000 data points per scrape limit.
If the cardinality is too high or you are hitting API limits, consider the following strategies.
Filtering at Scrape Time
You can isolate only the metrics/namespaces you need. For example, the following shows examples of filtering by modifying the metrics_path.
# Only specific namespaces matching the wildcard pattern
/v1/metrics?namespaces=production-*
# Only specific metrics
/v1/metrics?metrics=temporal_cloud_v1_workflow_success_count
# Combined filtering
/v1/metrics?namespaces=prod-*&metrics=temporal_cloud_v1_approximate_backlog_count
In Prometheus, the params
config can be set to match the same behavior as above.
scrape_configs:
- job_name: 'temporal-cloud'
...
static_configs:
- targets: ['metrics.temporal.io']
metrics_path: '/v1/metrics'
params:
namespaces: ['prod-*']
metrics: ['temporal_cloud_v1_approximate_backlog_count']
Label Management
Prometheus
If using Prometheus, you can configure it to drop metrics with a specific label or even rename specific label values to reduce the cardinality.
metric_relabel_configs:
# Consolidate non-critical task queues
- source_labels: [temporal_task_queue]
regex: '(critical-queue|payment-queue)'
target_label: __tmp_keep_original
replacement: 'true'
- source_labels: [__tmp_keep_original]
regex: ''
target_label: temporal_task_queue
replacement: 'other'
- regex: '__tmp_keep_original'
action: labeldrop
OpenTelemetry Collector
To accomplish the same as Prometheus, a filter can be used in the collector along with any other processors.
processors:
filter:
metrics:
include:
match_type: regexp
expressions:
# Only keep metrics with critical-queue or payment-queue
- Label("temporal_task_queue") == nil or IsMatch(Label("temporal_task_queue"), "^(critical-queue|payment-queue)$")
Monitoring Cardinality
Cardinality can be monitored using this PromQL query.
# Count the total number of series
count({__name__=~"temporal_cloud_v1_.*"})
# Count the total number of series by metric
count({__name__=~"temporal_cloud_v1_.*"}) by (__name__)
API Limits
Limit | Impact | Mitigation |
---|---|---|
30k total datapoints per scrape | Response may be truncated | Use namespace/metric filtering |
180 requests per account per hour | HTTP 429 returned | Set appropriate scrape interval of 30-60s |