Foundation Models API for llama.cpp.
Leverages the LanguageModel and LanguageModelExecutor protocols introduced in WWDC 2026. Offers the same API for llama.cpp models downloaded from Hugging Face. Built on the experimental LlamaKit.
import FoundationModels
import LlamaLanguageModels
let model = LlamaLanguageModel(
modelIdentifier: "Qwen/Qwen2.5-0.5B-Instruct-GGUF:Q4_K_M"
)
let session = LanguageModelSession(model: model)
let response = try await session.respond(to: "Who are you?")
print(response.content)Sources/fm_llama/ is a minimal REPL with streaming.
swift run fm_llama ggml-org/gemma-4-26B-A4B-it-GGUF:Q4_K_M "Respond in verse"- Swift 6.4+
- macOS 27+ / iOS 27+ / Xcode 27.0 beta
- Default model generation parameters
- Faithful token counts
- Tool calling
- Reasoning
- Constrained generation