On Device LLM
MESUT CAN YAGCI
On Device LLM is a local-first AI workbench for your iPhone. Open-source language and vision models run entirely on-device via Apple's MLX framework — no servers, no API keys, no telemetry, no accounts. LIVE CAMERA CAPTION Point your camera at anything. SmolVLM describes the scene every few seconds, streaming the caption right onto the viewfinder. Tune the refresh interval from 500ms to 30s. Tap the refresh icon for an on-demand caption between cycles. VOICE CONVERSATION Hands-free chat with an on-device LLM. Industry-standard voice activity detection ignores TV chatter and background noise; speaks back through the same audio session it's listening on — no clipped first words, no delayed replies. CODE ASSISTANT Pick from Qwen 2.5 Coder, Llama 3.2, Phi-3.5, Gemma 2, Qwen3 and dozens of MLX-converted models on HuggingFace. The app auto-selects a model that fits your device's RAM (1.5B on older iPhones, up to 7B on Pro Max). Streamed token-by-token output, code-block syntax highlighting, conversation export to Markdown. POWER TOOLS • Document scanner — capture multi-page code via the native four-corner doc cam • Live OCR — text on a screen, sign, or whiteboard recognized in real time • A/B compare — run two models side-by-side and watch the speed/quality tradeoff • Benchmark — measure tokens per second, time to first token, peak memory, thermal impact • Macros — chain prompts together (lint → refactor → test) • Snippets — save reusable prompt templates • Persona memory — different facts remembered per persona • Mac Bridge — pair your Mac, use the iPhone as an MLX inference server PRIVACY YOU CAN VERIFY • Every model runs on the device — your prompts never leave the phone • A network-activity indicator surfaces every outbound request (only model downloads) • Conversations stored with at-rest encryption • One-tap "wipe all on-device data" in Settings • No analytics SDKs, no sign-up, no account required OPTIMIZED FOR YOUR DEVICE • First-launch device-tier auto-pick for both the chat model and the camera VLM • SmolVLM 2.2B (bf16) on Pro Max; SmolVLM 4-bit on entry-tier iPhones • Automatic fallback chain when a model mirror is unreachable • Thermal awareness — clamps response length only at .critical (matches peer on-device apps) • Cancels in-flight Metal work when the app backgrounds so iOS won't kill it REQUIREMENTS • iOS 18.0 or later • Recommended: iPhone 12 or newer • Wi-Fi recommended for the initial model download (≈1.5 GB minimum) Open source models. Open standards. None of your data on someone else's server.