March 31, 2026
Ollama Taps Apple's MLX Framework for Faster Local AI on Mac

Ollama Taps Apple's MLX Framework for Faster Local AI on Mac

Posted 1 hour ago by
Ollama has launched a preview build of its local artificial intelligence runner that integrates with Apple's MLX machine learning framework. The update delivers its fastest performance yet on Apple silicon, accelerating demanding local workloads like personal assistants and autonomous coding agents.

Ollama Taps Apple's MLX Framework for Faster Local AI on Mac

Moving to the MLX framework allows the software to take full advantage of Apple's unified memory architecture. For users running machines equipped with the latest [M5, M5 Pro, and M5 Max chips], Ollama now utilizes the integrated GPU Neural Accelerators to speed up both the time to first token and overall generation rates.


The 0.19 update also improves how caching works to make iterative and agentic tasks more efficient. Ollama can now reuse its cache across conversations, reducing memory usage and increasing cache hits when branching with coding agents like [Codex], OpenCode, or [Claude Code]. Intelligent checkpoints store snapshots at key points in a prompt to reduce processing time, while updated eviction policies keep shared prefixes in memory longer as older branches are cleared.

Ollama Taps Apple's MLX Framework for Faster Local AI on Mac

Another major change is support for NVIDIA's NVFP4 format, which maintains model accuracy while reducing memory bandwidth and storage requirements for inference workloads. As more inference providers adopt NVFP4 at scale, bringing support locally helps developers better match production environments. It also enables Ollama to run models optimized using NVIDIA's model optimizer.

This preview release is built to accelerate the new Alibaba Qwen3.5-35B-A3B model, with sampling parameters tuned for coding tasks through assistants like OpenClaw. Due to the size and complexity of the model, running this preview requires a Mac with more than 32GB of unified memory.

Ollama says it is working to support additional models and architectures, with plans to introduce a simpler way for developers to import fine-tuned custom models in future updates.


Ollama Taps Apple's MLX Framework for Faster Local AI on Mac
Add Comment
Would you like to be notified when someone replies or adds a new comment?
Yes (All Threads)
Yes (This Thread Only)
No
iClarified Icon
Notifications
Would you like to be notified when we post a new Apple news article or tutorial?
Yes
No
Comments
You must login or register to add a comment...
Recent. Read the latest Apple News.
RECENT
Tutorials. Help is here.
TUTORIALS
Where to Download macOS Sequoia
Where to Download macOS Sonoma
AppleTV Firmware Download Locations
Where To Download iPad Firmware Files From
Where To Download iPhone Firmware Files From
Deals. Save on Apple devices and accessories.
DEALS