The Livepeer network supports three distinct categories of AI pipeline. Each category works differently at the protocol level: different connection models, different billing, different GPU requirements. Understanding which category fits your use case before building prevents rework. Constraint: Livepeer AI pipelines run on GPU capacity contributed by independent orchestrators. Availability and latency depend on the orchestrator set at any given time. The community gateway atDocumentation Index
Fetch the complete documentation index at: https://na-36-handover-docs-v2-into-docs-v2-dev-20260518.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
dream-gateway.livepeer.cloud routes to the best available orchestrator for development; production applications use a self-hosted gateway or a gateway provider for routing control.
Pipeline categories at a glance
Batch AI pipelines
Batch AI pipelines follow a request-and-response model: your application sends a job to the network, an orchestrator processes it, and you receive the result. There is no persistent connection. The GPU is assigned to your job, completes the inference, and is released. Orchestrators keep one model per pipeline “warm” in GPU memory. Requesting a model that no orchestrator currently has warm still works, but the first response is slower while the model loads (30 seconds to 5 minutes depending on model size). Warm model availability per pipeline is listed on the model support page. Where to start: AI quickstartReal-time AI
Real-time AI on Livepeer is built around thelive-video-to-video pipeline type. Unlike batch pipelines, real-time AI maintains a persistent stream connection: video frames flow in continuously, inference runs on each frame, and transformed frames flow back out at sub-second latency.
The infrastructure model differs from batch processing in four ways:
- Connection: Persistent WebRTC or trickle stream, not request/response
- Billing: Per second of compute time (confirmed in the go-livepeer
LivePaymentSenderinterface) - GPU assignment: Dedicated to your stream for its full duration
- Output: Continuous frame-by-frame results, not a single returned asset
Developer tools for real-time AI
Three tools serve different real-time AI use cases: ComfyStream (livepeer/comfystream) is the primary tool for building real-time AI pipelines. It turns ComfyUI’s node-graph workflow editor into a real-time inference engine for live video. Supported models include StreamDiffusion, ControlNet, IPAdapter, FaceID, LoRA, Whisper (audio), Gemma (video understanding), and SuperResolution. See ComfyStream overview.
PyTrickle (livepeer/pytrickle) is the Python SDK for building custom real-time processing services outside ComfyUI. Subclass FrameProcessor, implement process_frame(), and PyTrickle handles the trickle protocol transport, session management, and frame serialisation. See PyTrickle overview.
ComfyUI-Stream-Pack (livepeer/ComfyUI-Stream-Pack) provides custom ComfyUI nodes for live video and audio input: LoadTensor and LoadAudioTensor nodes that feed real-time media into ComfyUI workflows. See Stream Pack overview.
VTuber and agent avatar infrastructure
VTuber avatar generation requires sub-100ms latency, face/body tracking input, and a real-time diffusion pipeline running at 20+ FPS. Livepeer’s real-time AI infrastructure supports this via ComfyStream. The Agent SPE (treasury-funded Special Purpose Entity, approved April 2025 with 30,000 LPT) built the first production VTuber and AI avatar pipeline on Livepeer, delivering:- A real-time agent avatar generation pipeline using ComfyStream and StreamDiffusion
- A Livepeer model provider plugin for the Eliza agent framework (ai16z), enabling Eliza agents to route LLM inference through the Livepeer network
- ComfyStream as the real-time inference engine
live-video-to-videopipeline type via the AI gateway- StreamDiffusion custom nodes from ComfyUI-Stream-Pack for diffusion-based avatar transformation
- GPU requirements: NVIDIA RTX 3090 or better; RTX 4090 recommended for 25 FPS
Real-time AI requires a dedicated GPU for the duration of the stream. At peak network load, orchestrator availability for
live-video-to-video is lower than for batch pipelines. Test under expected concurrency before production launch.LLM pipeline
The LLM pipeline brings text inference to the Livepeer network using an Ollama-based runner with an OpenAI-compatible API. From a developer’s perspective, it works like any OpenAI-compatible chat completions endpoint. Requests route to decentralised GPU orchestrators instead of a centralised cloud provider. The LLM pipeline is currently in beta. It runs on a wider range of GPU hardware than diffusion-based batch pipelines: an orchestrator needs as little as 8 GB of VRAM to serve LLM workloads.meta-llama/Meta-Llama-3.1-8B-Instruct (warm, 8 GB VRAM), mistralai/Mistral-7B-Instruct-v0.3, google/gemma-2-9b-it, and Qwen/Qwen2.5-7B-Instruct. Any Ollama-compatible model works; cold-start applies to models not currently loaded on any orchestrator.
The LLM SPE built and maintains this pipeline. The Cloud SPE provides managed gateway access to it for production use.
Where to start: AI quickstart for the LLM endpoint; Eliza Livepeer plugin tutorial for the agent integration path.