High-performance YOLO inference library written in Rust. This library provides a fast, safe, and efficient interface for running YOLO models using ONNX Runtime, with an API designed to match the Ultralytics Python package.
- 🚀 High Performance - Pure Rust implementation with zero-cost abstractions
- 🎯 Ultralytics API Compatible -
Results,Boxes,Masks,Keypoints,Probs, andSemanticMasktypes matching the Python API shape - 🔧 Multiple Backends - CPU, XNNPACK, CUDA, TensorRT, CoreML, OpenVINO, and more via ONNX Runtime
- 📦 Dual Use - Library for Rust projects + standalone CLI application
- 🏷️ Auto Metadata - Automatically reads class names, task type, and input size from ONNX models
- ⬇️ Auto Download - Downloads supported YOLO26, YOLO11, and YOLOv8 ONNX models (sizes: n/s/m/l/x) when not found locally
- 🖼️ Multiple Sources - Images, directories, glob patterns, video files, webcams, and streams
- 🪶 Lean Runtime - No PyTorch, TensorFlow, or Python runtime required
This crate runs YOLOv8, YOLO11, and YOLO26 ONNX models. They are pretrained on COCO for Detection, Segmentation, and Pose Estimation; on DOTA for OBB; on Cityscapes for Semantic Segmentation; and on ImageNet for Classification. All models download automatically from the latest Ultralytics release on first use.
- Rust 1.89+ (install via rustup)
- A YOLO ONNX model (export from Ultralytics:
yolo export model=yolo26n.pt format=onnx)
# Install CLI globally from crates.io
cargo install ultralytics-inference
# Install CLI globally with custom features
# Minimal build (no default features)
cargo install ultralytics-inference --no-default-features
# Enable video support
cargo install ultralytics-inference --features video
# Enable multiple accelerators
cargo install ultralytics-inference --features "cuda,tensorrt"# Install CLI directly from the git repository
cargo install --git https://github.com/ultralytics/inference.git ultralytics-inference
# Or clone, build, and install from source
git clone https://github.com/ultralytics/inference.git
cd inference
cargo build --release
# Install from local checkout
cargo install --path . --lockedcargo install places binaries in Cargo's default bin directory:
- macOS/Linux:
~/.cargo/bin - Windows:
%USERPROFILE%\\.cargo\\bin
Ensure this directory is in your PATH, then run from anywhere:
ultralytics-inference help# Using Ultralytics CLI
yolo export model=yolo26n.pt format=onnx# Or with Python
from ultralytics import YOLO
model = YOLO("yolo26n.pt")
model.export(format="onnx")# With defaults (auto-downloads yolo26n.onnx and sample images)
ultralytics-inference predict
# Select task: auto-downloads the nano model for that task
ultralytics-inference predict --task segment # downloads yolo26n-seg.onnx
ultralytics-inference predict --task pose # downloads yolo26n-pose.onnx
ultralytics-inference predict --task obb # downloads yolo26n-obb.onnx
ultralytics-inference predict --task classify # downloads yolo26n-cls.onnx
ultralytics-inference predict --task semantic # downloads yolo26n-sem.onnx (YOLO26 only)
# With explicit model (task is read from model metadata)
ultralytics-inference predict --model yolo26n.onnx --source image.jpg
# Auto-download any supported size (n/s/m/l/x) across YOLO26, YOLO11, and YOLOv8
ultralytics-inference predict --model yolo26l.onnx --source image.jpg
ultralytics-inference predict --model yolo11x-seg.onnx --source image.jpg
ultralytics-inference predict --model yolov8n.onnx --source image.jpg
# On a directory of images
ultralytics-inference predict --model yolo26n.onnx --source assets/
# With custom thresholds
ultralytics-inference predict -m yolo26n.onnx -s image.jpg --conf 0.5 --iou 0.45
# Filter by class IDs
ultralytics-inference predict --model yolo26n.onnx --source image.jpg --classes 0
ultralytics-inference predict --model yolo26n.onnx --source image.jpg --classes "0,1,2"
# With visualization and custom image size
ultralytics-inference predict --model yolo26n.onnx --source video.mp4 --show --imgsz 1280
# Save individual frames for video input
ultralytics-inference predict --model yolo26n.onnx --source video.mp4 --save-frames
# Rectangular inference
ultralytics-inference predict --model yolo26n.onnx --source image.jpg --rect
# Semantic segmentation: write per-image PNG class maps to runs/semantic/predictN/results/
ultralytics-inference predict --task semantic --source cityscapes/ --save-jsonultralytics-inference predictWARNING ⚠️ 'model' argument is missing. Using default '--model=yolo26n.onnx'.
WARNING ⚠️ 'source' argument is missing. Using default images: https://ultralytics.com/images/bus.jpg, https://ultralytics.com/images/zidane.jpg
Ultralytics Inference 0.0.21 🚀 Rust ONNX FP32 CPU
Using ONNX Runtime CPUExecutionProvider
YOLO26n summary: 80 classes, imgsz=(640, 640)
image 1/2 /home/ultralytics/inference/bus.jpg: 640x480 4 persons, 1 bus, 36.4ms
image 2/2 /home/ultralytics/inference/zidane.jpg: 384x640 2 persons, 1 tie, 28.6ms
Speed: 1.5ms preprocess, 32.5ms inference, 0.5ms postprocess per image at shape (1, 3, 384, 640)
Results saved to runs/detect/predict1
💡 Learn more at https://docs.ultralytics.com/modes/predict
With --task (auto-downloads the matching nano model):
ultralytics-inference predict --task segmentWARNING ⚠️ 'model' argument is missing. Using default '--model=yolo26n-seg.onnx'.
WARNING ⚠️ 'source' argument is missing. Using default images: https://ultralytics.com/images/bus.jpg, https://ultralytics.com/images/zidane.jpg
Ultralytics Inference 0.0.21 🚀 Rust ONNX FP32 CPU
Using ONNX Runtime CPUExecutionProvider
YOLO26n-seg summary: 80 classes, imgsz=(640, 640)
image 1/2 /home/ultralytics/inference/bus.jpg: 640x480 4 persons, 1 bus, 48.2ms
image 2/2 /home/ultralytics/inference/zidane.jpg: 384x640 2 persons, 1 tie, 38.1ms
Speed: 1.6ms preprocess, 44.3ms inference, 1.2ms postprocess per image at shape (1, 3, 384, 640)
Results saved to runs/segment/predict1
💡 Learn more at https://docs.ultralytics.com/modes/predict
# Show help
ultralytics-inference help
# Show version
ultralytics-inference version
# Run inference
ultralytics-inference predict --model <model.onnx> --source <source>--help and --version are also supported as standard flag aliases.
CLI Options:
| Option | Short | Description | Default |
|---|---|---|---|
--model |
-m |
Path to ONNX model file; auto-downloaded if a known YOLOv8/YOLO11/YOLO26 name | yolo26n.onnx |
--task |
Task type (detect, segment, pose, obb, classify, semantic*); selects nano model when --model is omitted |
detect |
|
--source |
-s |
Input source (image, directory, glob, video, webcam index, or URL) | Task-dependent Ultralytics URL assets |
--conf |
Confidence threshold | 0.25 |
|
--iou |
IoU threshold for NMS | 0.7 |
|
--max-det |
Maximum number of detections | 300 |
|
--imgsz |
Inference image size | Model metadata |
|
--rect |
Enable rectangular inference (minimal padding) | true |
|
--batch |
Batch size for inference | 1 |
|
--half |
Use FP16 half-precision inference | false |
|
--save |
Save annotated results to runs/<task>/predict | true |
|
--save-frames |
Save individual frames for video input (instead of video file) | false |
|
--save-json |
Save semantic segmentation class-map PNGs for external evaluation | false |
|
--show |
Display results in a window | false |
|
--device |
Device string, e.g. cpu, cuda:0, coreml, directml:0, openvino, tensorrt:0, rocm:0, xnnpack; additional providers selectable when their feature is enabled (see Features table) | cpu |
|
--verbose |
Show verbose output | true |
|
--classes |
Filter by class IDs, e.g. 0 or "0,1,2" or "[0, 1, 2]" |
all classes |
Task and Model Resolution:
| Invocation | Model used | Notes |
|---|---|---|
predict |
yolo26n.onnx |
Default detect model, auto-downloaded |
predict --task segment |
yolo26n-seg.onnx |
Nano seg model, auto-downloaded |
predict --task pose |
yolo26n-pose.onnx |
Nano pose model, auto-downloaded |
predict --task obb |
yolo26n-obb.onnx |
Nano OBB model, auto-downloaded |
predict --task classify |
yolo26n-cls.onnx |
Nano classify model, auto-downloaded |
predict --task semantic |
yolo26n-sem.onnx* |
Nano semantic segmentation model, auto-downloaded (YOLO26 only) |
predict --model yolo26l-seg.onnx |
yolo26l-seg.onnx |
Task read from model metadata |
predict --task segment --model yolo26l-seg.onnx |
yolo26l-seg.onnx |
--task matches metadata, proceeds normally |
predict --task segment --model yolo26n.onnx |
error | --task conflicts with model metadata (detect), exits with error |
* semantic (semantic segmentation) is YOLO26-only.
Auto-downloadable models:
YOLOv8, YOLO11, and YOLO26 ONNX models in sizes n / s / m / l / x are supported for auto-download across the standard task variants. YOLO26 also includes -sem for semantic segmentation:
| Family | Variants |
|---|---|
| YOLO26 | yolo26{n,s,m,l,x}.onnx, yolo26{n,s,m,l,x}-seg.onnx, -pose, -obb, -cls, -sem* |
| YOLO11 | yolo11{n,s,m,l,x}.onnx, yolo11{n,s,m,l,x}-seg.onnx, -pose, -obb, -cls |
| YOLOv8 | yolov8{n,s,m,l,x}.onnx, yolov8{n,s,m,l,x}-seg.onnx, -pose, -obb, -cls |
* -sem (semantic segmentation) is YOLO26-only.
Source Options:
| Source Type | Example Input | Description |
|---|---|---|
| Image | image.jpg |
Single image file |
| Directory | images/ |
Directory of images |
| Glob | images/*.jpg |
Glob pattern for images |
| Video | video.mp4 |
Video file |
| Webcam | 0,1 |
Webcam index (0 = default webcam) |
| URL | https://example.com/image.jpg |
Remote image URL |
Add to your Cargo.toml (choose one):
# Stable release from crates.io
[dependencies]
ultralytics-inference = "0.0.21"# Development version (latest unreleased code from GitHub)
[dependencies]
ultralytics-inference = { git = "https://github.com/ultralytics/inference.git" }Basic Usage:
use ultralytics_inference::{YOLOModel, InferenceConfig};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Load model - metadata (classes, task, imgsz) is read automatically
let mut model = YOLOModel::load("yolo26n.onnx")?;
// Run inference
let results = model.predict("image.jpg")?;
// Process results
for result in &results {
if let Some(ref boxes) = result.boxes {
println!("Found {} detections", boxes.len());
for i in 0..boxes.len() {
let cls = boxes.cls()[i] as usize;
let conf = boxes.conf()[i];
let name = result.names.get(&cls).map(|s| s.as_str()).unwrap_or("unknown");
println!(" {} {:.2}", name, conf);
}
}
}
Ok(())
}With Custom Configuration:
use ultralytics_inference::{YOLOModel, InferenceConfig};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let config = InferenceConfig::new()
.with_confidence(0.5)
.with_iou(0.45)
.with_max_det(300);
let mut model = YOLOModel::load_with_config("yolo26n.onnx", config)?;
let results = model.predict("image.jpg")?;
Ok(())
}Accessing Detection Data:
if let Some(ref boxes) = result.boxes {
// Bounding boxes in different formats
let xyxy = boxes.xyxy(); // [x1, y1, x2, y2]
let xywh = boxes.xywh(); // [x_center, y_center, width, height]
let xyxyn = boxes.xyxyn(); // Normalized [0-1]
let xywhn = boxes.xywhn(); // Normalized [0-1]
// Confidence scores and class IDs
let conf = boxes.conf(); // Confidence scores
let cls = boxes.cls(); // Class IDs
}Selecting a Device:
use ultralytics_inference::{Device, InferenceConfig, YOLOModel};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Select a device (e.g., CUDA, CoreML, CPU)
let device = Device::Cuda(0);
// Configure the model to use this device
let config = InferenceConfig::new().with_device(device);
let mut model = YOLOModel::load_with_config("yolo26n.onnx", config)?;
let results = model.predict("image.jpg")?;
Ok(())
}inference/
├── src/
│ ├── lib.rs # Library entry point and public exports
│ ├── main.rs # CLI application
│ ├── model.rs # YOLOModel - ONNX session and inference
│ ├── results.rs # Results, Boxes, Masks, Keypoints, Probs, Obb, SemanticMask
│ ├── preprocessing.rs # Image preprocessing (letterbox, normalize, SIMD)
│ ├── postprocessing.rs # Post-processing for all tasks (NMS/decode for detection, argmax for semantic segmentation)
│ ├── metadata.rs # ONNX model metadata parsing
│ ├── source.rs # Input source handling (images, video, webcam)
│ ├── task.rs # Task enum (Detect, Segment, Pose, Classify, Obb, Semantic)
│ ├── inference.rs # InferenceConfig
│ ├── batch.rs # Batch processing pipeline
│ ├── device.rs # Device enum (CPU, CUDA, CoreML, etc.)
│ ├── download.rs # Model and asset downloading
│ ├── annotate.rs # Image annotation (bounding boxes, instance masks, keypoints, semantic overlay)
│ ├── io.rs # Result saving (images, videos)
│ ├── logging.rs # Logging macros
│ ├── error.rs # Error types
│ ├── utils.rs # Utility functions (NMS, IoU)
│ ├── cli/ # CLI module
│ │ ├── mod.rs # CLI module exports
│ │ ├── args.rs # CLI argument parsing
│ │ └── predict.rs # Predict command implementation
│ └── visualizer/ # Real-time visualization (minifb)
├── tests/
│ └── integration_test.rs # Integration tests
├── assets/ # Test images
│ ├── boats.jpg
│ ├── bus.jpg
│ └── zidane.jpg
├── Cargo.toml # Rust dependencies and features
├── LICENSE # AGPL-3.0 License
├── README.md # English README
└── README.zh-CN.md # Simplified Chinese README
Enable hardware acceleration by adding features to your build:
# NVIDIA GPU (CUDA)
cargo build --release --features cuda
# NVIDIA TensorRT
cargo build --release --features tensorrt
# NVIDIA GPU preprocessing + zero-copy TensorRT input (fastest; needs CUDA toolkit)
cargo build --release --features cuda-preprocess
# Apple CoreML (macOS/iOS)
cargo build --release --features coreml
# Intel OpenVINO
cargo build --release --features openvino
# Multiple features
cargo build --release --features "cuda,tensorrt"NVIDIA setup, requirements, and the GPU preprocessing fast path are documented in
docs/CUDA.md.
Available Features:
Default features (enabled unless --no-default-features is passed): annotate, visualize.
| Feature | Description |
|---|---|
annotate |
Image annotation for --save (default) |
visualize |
Real-time window display for --show (default) |
video |
Video file decoding/encoding (requires FFmpeg) |
cuda |
NVIDIA CUDA support |
tensorrt |
NVIDIA TensorRT optimization |
cuda-preprocess |
GPU preprocessing + zero-copy TensorRT input (needs CUDA toolkit; see docs/CUDA.md) |
coreml |
Apple CoreML (macOS/iOS) |
openvino |
Intel OpenVINO |
onednn |
Intel oneDNN |
rocm |
AMD ROCm |
migraphx |
AMD MIGraphX |
directml |
DirectML (Windows) |
nnapi |
Android Neural Networks API |
qnn |
Qualcomm Neural Networks |
xnnpack |
XNNPACK (cross-platform) |
acl |
ARM Compute Library |
armnn |
ARM NN |
tvm |
Apache TVM |
rknpu |
Rockchip NPU |
cann |
Huawei CANN |
webgpu |
WebGPU |
azure |
Azure |
nvidia |
Convenience: CUDA + TensorRT |
amd |
Convenience: ROCm + MIGraphX |
intel |
Convenience: OpenVINO + oneDNN |
mobile |
Convenience: NNAPI + CoreML + QNN |
all |
Convenience: annotate + visualize + video |
The same engine runs in the browser on WebGPU, compiled to WebAssembly. The
forward pass executes on the official ONNX Runtime Web build, bridged through
ort-web, while the shared Rust
preprocessing and postprocessing run in wasm, so results match the native path.
It ships as the @ultralytics/yolo npm package:
import { YOLO } from "@ultralytics/yolo";
const model = await YOLO.load("yolo26n.onnx");
const results = await model.predict("bus.jpg");
console.log(results.boxes); // [{ x1, y1, x2, y2, conf, cls, name, color }, ...]Pass { device: "webgpu" | "cpu" } to pick the accelerator ("auto" is the
default), and read model.device to see what actually ran.
The browser bindings live in crates/web (the
ultralytics-inference-web cdylib); the JS/TS wrapper and build instructions are
in web/. A WebGPU-capable browser and a secure context
(https/localhost) are required.
One of the key benefits of this library is a Rust/ONNX Runtime stack with no PyTorch, TensorFlow, or Python runtime required.
| Crate | Purpose |
|---|---|
ort |
ONNX Runtime bindings |
ndarray |
N-dimensional arrays |
image |
Image loading/decoding |
jpeg-decoder |
JPEG decoding |
fast_image_resize |
SIMD-optimized resizing |
half |
FP16 support |
lru |
LRU cache for preprocessing LUT |
wide |
SIMD for fast preprocessing |
| Crate | Purpose |
|---|---|
imageproc |
Drawing boxes and shapes |
ab_glyph |
Text rendering (embedded font) |
| Crate | Purpose |
|---|---|
minifb |
Window creation and buffer display |
video-rs |
Video decoding/encoding (ffmpeg) |
Video features require FFmpeg (7 or 8) installed on your system:
# macOS
brew install ffmpeg
# Ubuntu/Debian
apt-get install -y ffmpeg libavutil-dev libavformat-dev libavfilter-dev libavdevice-dev libclang-dev
# Build with video support
cargo build --release --features videoTo build without annotation and visualization support (smaller binary):
cargo build --release --no-default-features# Run all tests
cargo test
# Run with output
cargo test -- --nocapture
# Run specific test
cargo test test_boxes_creationBenchmarks on Apple M4 MacBook Pro (CPU, ONNX Runtime):
| Precision | Model Size | Preprocess | Inference | Postprocess | Total |
|---|---|---|---|---|---|
| FP32 | 10.2 MB | ~9ms | ~21ms | <1ms | ~31ms |
| FP16 | 5.2 MB | ~9ms | ~24ms | <1ms | ~34ms |
Key findings:
- FP16 models are ~50% smaller (5.2 MB vs 10.2 MB)
- FP32 is slightly faster on CPU (~21ms vs ~24ms) due to CPU's native FP32 support
- FP16 requires upcasting to FP32 for computation on most CPUs, adding overhead
- Use FP32 for CPU inference, FP16 for GPU (where it provides speedup)
ONNX Runtime threading is set to auto (num_threads: 0) which lets ORT choose optimal thread count:
- Manual threading (4 threads): ~40ms inference
- Auto threading (0 = ORT decides): ~21ms inference
- Detection, Segmentation, Pose, Classification, OBB, and Semantic Segmentation inference
- ONNX model metadata parsing (auto-detect classes, task, imgsz)
- Hardware acceleration support (CUDA, TensorRT, CoreML, OpenVINO, XNNPACK)
- Ultralytics-compatible Results API (
Boxes,Masks,Keypoints,Probs,Obb,SemanticMask) - Multiple input sources (images, directories, globs, URLs)
- Video file support and webcam/RTSP streaming
- Image annotation and visualization
- FP16 half-precision inference
- Batch inference support
- Rectangular inference support and optimization
- Class filtering support
- Auto-download all YOLO26, YOLO11, and YOLOv8 ONNX models (all sizes n/s/m/l/x, all tasks)
-
--taskCLI flag: selects and auto-downloads the matching nano model when--modelis omitted; errors on task/model metadata conflict - WebAssembly (WASM) browser inference on WebGPU (npm
@ultralytics/yolo)
- Python bindings (PyO3)
Ultralytics thrives on community collaboration, and we deeply value your contributions! Whether it's reporting bugs, suggesting features, or submitting code changes, your involvement is crucial.
- Report Issues: Found a bug? Open an issue
- Feature Requests: Have an idea? Share it
- Pull Requests: Read our Contributing Guide first
- Feedback: Take our Survey
A heartfelt thank you 🙏 goes out to all our contributors! Your efforts help make Ultralytics tools better for everyone.
Ultralytics offers two licensing options to suit different needs:
- AGPL-3.0 License: This OSI-approved open-source license is perfect for students, researchers, and enthusiasts. It encourages open collaboration and knowledge sharing. See the LICENSE file for full details.
- Ultralytics Enterprise License: Designed for commercial use, this license allows for the seamless integration of Ultralytics software and AI models into commercial products and services, bypassing the open-source requirements of AGPL-3.0. If your use case involves commercial deployment, please contact us via Ultralytics Licensing.
- GitHub Issues: Bug reports and feature requests
- Discord: Join our community
- Documentation: docs.ultralytics.com








