Hello World with LiteLLM

Last updated Oct 22, 2025

LiteLLM is a library for calling LLMs from Python. It makes it easy to access, and switch between, many providers, including OpenAI, Anthropic, Google, and more.

This recipe mirrors the Basic Python recipe, but swaps the OpenAI SDK for LiteLLM. The workflow still delegates LLM calls to an Activity, letting Temporal coordinate retries and durability, while LiteLLM forwards those calls to your configured provider.

Key points:

A reusable Activity that wraps litellm.acompletion and keeps retries in Temporal.
The most common LiteLLM parameters are on LiteLLMRequest ensuring type checking and IDE completion. Others may be passed via the extra_options dictionary, which functions as kwargs for litellm.acompletion.
The Activity returns the full LiteLLM response for processing by the workflow.

Create the Activity

activities/models.py

from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional, Type, Union


@dataclass
class LiteLLMRequest:
    model: str
    messages: List[Dict[str, Any]]
    temperature: Optional[float] = None
    max_tokens: Optional[int] = None
    timeout: Optional[Union[float, int]] = None
    response_format: Optional[Union[dict, Type[Any]]] = None
    extra_options: Dict[str, Any] = field(default_factory=dict)

    def to_acompletion_kwargs(self) -> Dict[str, Any]:
        kwargs = {
            "model": self.model,
            "messages": self.messages,
        }

        optional_values = {
            "temperature": self.temperature,
            "max_tokens": self.max_tokens,
            "timeout": self.timeout,
            "response_format": self.response_format,
        }

        for key, value in optional_values.items():
            if value is not None:
                kwargs[key] = value

        if self.extra_options:
            kwargs.update(self.extra_options)

        return kwargs

activities/litellm_completion.py

from typing import Any, Dict

import litellm
from temporalio import activity
from temporalio.exceptions import ApplicationError

from activities.models import LiteLLMRequest


@activity.defn(name="activities.litellm_completion.create")
async def create(request: LiteLLMRequest) -> Dict[str, Any]:
    kwargs = request.to_acompletion_kwargs()
    kwargs["num_retries"] = 0

    try:
        response = await litellm.acompletion(**kwargs)
    except (
        litellm.AuthenticationError,
        litellm.BadRequestError,
        litellm.InvalidRequestError,
        litellm.UnsupportedParamsError,
        litellm.JSONSchemaValidationError,
        litellm.ContentPolicyViolationError,
        litellm.NotFoundError,
    ) as exc:
        raise ApplicationError(
            str(exc),
            type=exc.__class__.__name__,
            non_retryable=True,
        ) from exc
    except litellm.APIError:
        raise

    return response

LiteLLM supports many providers. Configure credentials via environment variables (for example OPENAI_API_KEY) before running the Activity. For Google-hosted models (Vertex AI or Gemini), the sample relies on the google-cloud-aiplatform and google-auth dependencies included in pyproject.toml; set the usual Google application credentials (GOOGLE_APPLICATION_CREDENTIALS, GOOGLE_CLOUD_PROJECT, VERTEXAI_LOCATION, etc.) so LiteLLM can obtain an access token.

Create the Workflow

workflows/hello_world_workflow.py

from datetime import timedelta

from temporalio import workflow

from activities.models import LiteLLMRequest


@workflow.defn
class HelloWorld:
    @workflow.run
    async def run(self, input: str) -> str:
        messages = [
            {"role": "system", "content": "You only respond in haikus."},
            {"role": "user", "content": input},
        ]
        response = await workflow.execute_activity(
            "activities.litellm_completion.create",
            LiteLLMRequest(
                # LiteLLM lets you keep the same code and swap models/providers.
                # model="gpt-4o-mini",
                model="gemini-2.5-flash-lite",
                messages=messages,
            ),
            start_to_close_timeout=timedelta(seconds=30),
        )
        message = response["choices"][0]["message"]["content"]
        if isinstance(message, list):
            message = "".join(
                part.get("text", "")
                for part in message
                if isinstance(part, dict)
            )
        return message

Temporal manages Activity retries, so LiteLLM's retry helper is disabled via num_retries=0. Use the extra_options escape hatch on LiteLLMRequest if you need to surface additional LiteLLM parameters without editing the sample.

Create the Worker

worker.py

import asyncio

from temporalio.client import Client
from temporalio.worker import Worker

from activities import litellm_completion
from workflows.hello_world_workflow import HelloWorld
from temporalio.contrib.pydantic import pydantic_data_converter


async def main():
    client = await Client.connect(
        "localhost:7233",
        data_converter=pydantic_data_converter,
    )

    worker = Worker(
        client,
        task_queue="hello-world-python-task-queue",
        workflows=[
            HelloWorld,
        ],
        activities=[
            litellm_completion.create,
        ],
    )
    await worker.run()


if __name__ == "__main__":
    asyncio.run(main())

Create the Workflow Starter

start_workflow.py

import asyncio

from temporalio.client import Client
from temporalio.contrib.pydantic import pydantic_data_converter

from workflows.hello_world_workflow import HelloWorld


async def main():
    client = await Client.connect(
        "localhost:7233",
        data_converter=pydantic_data_converter,
    )

    result = await client.execute_workflow(
        HelloWorld.run,
        "Tell me about recursion in programming.",
        id="my-workflow-id",
        task_queue="hello-world-python-task-queue",
    )
    print(f"Result: {result}")


if __name__ == "__main__":
    asyncio.run(main())

Running

Start the Temporal Dev Server:

temporal server start-dev

Install dependencies

uv sync

Set the appropriate environment variables before launching the worker (for example export OPENAI_API_KEY=... or export GEMINI_API_KEY=...) so LiteLLM can reach your chosen provider.

Run the worker:

uv run python -m worker

Start the workflow:

uv run python -m start_workflow

Create the Activity​

Create the Workflow​

Create the Worker​

Create the Workflow Starter​

Running​

Create the Activity

Create the Workflow

Create the Worker

Create the Workflow Starter

Running