Hello World with LiteLLM
Last updated Oct 3, 2025
LiteLLM is a library for calling LLMs from Python. It makes it easy to access, and switch between, many providers, including OpenAI, Anthropic, Google, and more.
This recipe mirrors the Basic Python recipe, but swaps the OpenAI SDK for LiteLLM. The workflow still delegates LLM calls to an Activity, letting Temporal coordinate retries and durability, while LiteLLM forwards those calls to your configured provider.
Key points:
- A reusable Activity that wraps
litellm.acompletion
and keeps retries in Temporal. - The most common LiteLLM parameters are on
LiteLLMRequest
ensuring type checking and IDE completion. Others may be passed via theextra_options
dictionary, which functions askwargs
forlitellm.acompletion
. - The Activity returns the full LiteLLM response for processing by the workflow.
Create the Activity
activities/models.py
from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional, Type, Union
@dataclass
class LiteLLMRequest:
model: str
messages: List[Dict[str, Any]]
temperature: Optional[float] = None
max_tokens: Optional[int] = None
timeout: Optional[Union[float, int]] = None
response_format: Optional[Union[dict, Type[Any]]] = None
extra_options: Dict[str, Any] = field(default_factory=dict)
def to_acompletion_kwargs(self) -> Dict[str, Any]:
kwargs = {
"model": self.model,
"messages": self.messages,
}
optional_values = {
"temperature": self.temperature,
"max_tokens": self.max_tokens,
"timeout": self.timeout,
"response_format": self.response_format,
}
for key, value in optional_values.items():
if value is not None:
kwargs[key] = value
if self.extra_options:
kwargs.update(self.extra_options)
return kwargs
activities/litellm_completion.py
from typing import Any, Dict
import litellm
from temporalio import activity
from temporalio.exceptions import ApplicationError
from activities.models import LiteLLMRequest
@activity.defn(name="activities.litellm_completion.create")
async def create(request: LiteLLMRequest) -> Dict[str, Any]:
kwargs = request.to_acompletion_kwargs()
kwargs["num_retries"] = 0
try:
response = await litellm.acompletion(**kwargs)
except (
litellm.AuthenticationError,
litellm.BadRequestError,
litellm.InvalidRequestError,
litellm.UnsupportedParamsError,
litellm.JSONSchemaValidationError,
litellm.ContentPolicyViolationError,
litellm.NotFoundError,
) as exc:
raise ApplicationError(
str(exc),
type=exc.__class__.__name__,
non_retryable=True,
) from exc
except litellm.APIError:
raise
return response
LiteLLM supports many providers. Configure credentials via environment variables (for example OPENAI_API_KEY
) before running the Activity. For Google-hosted models (Vertex AI or Gemini), the sample relies on the google-cloud-aiplatform
and google-auth
dependencies included in pyproject.toml
; set the usual Google application credentials (GOOGLE_APPLICATION_CREDENTIALS
, GOOGLE_CLOUD_PROJECT
, VERTEXAI_LOCATION
, etc.) so LiteLLM can obtain an access token.
Create the Workflow
workflows/hello_world_workflow.py
from datetime import timedelta
from temporalio import workflow
from activities.models import LiteLLMRequest
@workflow.defn
class HelloWorld:
@workflow.run
async def run(self, input: str) -> str:
messages = [
{"role": "system", "content": "You only respond in haikus."},
{"role": "user", "content": input},
]
response = await workflow.execute_activity(
"activities.litellm_completion.create",
LiteLLMRequest(
# LiteLLM lets you keep the same code and swap models/providers.
# model="gpt-4o-mini",
model="gemini-2.5-flash-lite",
messages=messages,
),
start_to_close_timeout=timedelta(seconds=30),
)
message = response["choices"][0]["message"]["content"]
if isinstance(message, list):
message = "".join(
part.get("text", "")
for part in message
if isinstance(part, dict)
)
return message
Temporal manages Activity retries, so LiteLLM's retry helper is disabled via num_retries=0
. Use the extra_options
escape hatch on LiteLLMRequest
if you need to surface additional LiteLLM parameters without editing the sample.
Create the Worker
worker.py
import asyncio
from temporalio.client import Client
from temporalio.worker import Worker
from activities import litellm_completion
from workflows.hello_world_workflow import HelloWorld
from temporalio.contrib.pydantic import pydantic_data_converter
async def main():
client = await Client.connect(
"localhost:7233",
data_converter=pydantic_data_converter,
)
worker = Worker(
client,
task_queue="hello-world-python-task-queue",
workflows=[
HelloWorld,
],
activities=[
litellm_completion.create,
],
)
await worker.run()
if __name__ == "__main__":
asyncio.run(main())
Create the Workflow Starter
start_workflow.py
import asyncio
from temporalio.client import Client
from temporalio.contrib.pydantic import pydantic_data_converter
from workflows.hello_world_workflow import HelloWorld
async def main():
client = await Client.connect(
"localhost:7233",
data_converter=pydantic_data_converter,
)
result = await client.execute_workflow(
HelloWorld.run,
"Tell me about recursion in programming.",
id="my-workflow-id",
task_queue="hello-world-python-task-queue",
)
print(f"Result: {result}")
if __name__ == "__main__":
asyncio.run(main())
Running
Start the Temporal Dev Server:
temporal server start-dev
Install dependencies
uv sync
Set the appropriate environment variables before launching the worker (for example export OPENAI_API_KEY=...
or export GEMINI_API_KEY=...
) so LiteLLM can reach your chosen provider.
Run the worker:
uv run python -m worker
Start the workflow:
uv run python -m start_workflow