Ollama Taps Apple's MLX Framework for Faster Local AI on Mac

POST

MAIL

Posted March 31, 2026 at 1:43pm by Shalom Levytam

Ollama has launched a preview build of its local artificial intelligence runner that integrates with Apple's MLX machine learning framework. The update delivers its fastest performance yet on Apple silicon, accelerating demanding local workloads like personal assistants and autonomous coding agents.

Ollama Taps Apple's MLX Framework for Faster Local AI on Mac

Moving to the MLX framework allows the software to take full advantage of Apple's unified memory architecture. For users running machines equipped with the latest [M5, M5 Pro, and M5 Max chips], Ollama now utilizes the integrated GPU Neural Accelerators to speed up both the time to first token and overall generation rates.

The 0.19 update also improves how caching works to make iterative and agentic tasks more efficient. Ollama can now reuse its cache across conversations, reducing memory usage and increasing cache hits when branching with coding agents like [Codex], OpenCode, or [Claude Code]. Intelligent checkpoints store snapshots at key points in a prompt to reduce processing time, while updated eviction policies keep shared prefixes in memory longer as older branches are cleared.

Another major change is support for NVIDIA's NVFP4 format, which maintains model accuracy while reducing memory bandwidth and storage requirements for inference workloads. As more inference providers adopt NVFP4 at scale, bringing support locally helps developers better match production environments. It also enables Ollama to run models optimized using NVIDIA's model optimizer.

This preview release is built to accelerate the new Alibaba Qwen3.5-35B-A3B model, with sampling parameters tuned for coding tasks through assistants like OpenClaw. Due to the size and complexity of the model, running this preview requires a Mac with more than 32GB of unified memory.

Ollama says it is working to support additional models and architectures, with plans to introduce a simpler way for developers to import fine-tuned custom models in future updates.

Get the iClarified Daily Newsletter

Apple news, rumors, tutorials, price drop alerts, in your inbox every evening, free.

Unsubscribe at any time.