Running All-in-One Llama Stack
Goal
Learn how to quickly deploy and run the complete Llama Stack with all features enabled using our streamlined all-in-one setup.
Overview
Want to quickly try the complete Llama Stack? Use our all-in-one setup that provides a full-featured environment with safety, telemetry, and tools in minutes.
Quick Start
Get the complete Llama Stack running with one command:
make all
This starts:
-
🤖 Llama Stack with safety shields (llama-guard)
-
🌤️ Weather tools via Model Context Protocol (MCP)
-
📊 Telemetry with Jaeger tracing
-
🎮 Web playground at http://localhost:8502
Available Commands
The following commands are available for managing your Llama Stack deployment:
Command | Description |
---|---|
|
Start complete stack with playground |
|
Start Llama Stack services |
|
Start web playground |
|
Stop all containers (preserve data) |
|
Remove all data and containers |
|
Show configuration |
Configuration
Customize your setup by modifying these variables in the Makefile:
INFERENCE_MODEL = meta-llama/Llama-3.2-3B-Instruct
SAFETY_MODEL_ID = meta-llama/Llama-Guard-3-8B
OLLAMA_URL = http://host.containers.internal:11434
LLAMA_STACK_PORT = 8321
What’s Included
The all-in-one setup provides:
-
Safety First: Built-in content filtering with Llama Guard
-
Observability: Complete telemetry with OpenTelemetry and Jaeger
-
Extensible: Weather tools via MCP, easily add more
-
Production Ready: Parameterized configs for different environments
-
Developer Friendly: One command setup, clean teardown
Access Points
Once running, you can access:
-
Web Playground: http://localhost:8502
-
Llama Stack API: http://localhost:8321
-
Jaeger Telemetry: http://localhost:16686