MLX
5 mentions across all digests
MLX is Apple's machine learning framework for Apple Silicon, used to run large models like Qwen3.5-397B locally via SSD streaming and optimized Objective-C and Metal code generated through agentic AI-driven experimentation.
Autoresearching Apple's "LLM in a Flash" to run Qwen 397B locally
Apple's LLM in a Flash technique enables a 397B-parameter Qwen model to run on a MacBook M3 Max at 5.5 tokens/sec by streaming 4-bit quantized weights from SSD, leaving only 5.5GB resident in RAM.
Ternary Bonsai: Top Intelligence at 1.58 Bits
PrismML's 1.58-bit Ternary Bonsai models achieve 9x memory compression while outperforming their 1-bit predecessors, bringing extreme quantization and edge inference to Apple devices.
Gemma 4 audio with MLX
Google's Gemma 4 now transcribes audio locally on macOS via MLX, bringing multimodal AI inference to Apple silicon without cloud dependencies.
[AINews] Gemma 4 crosses 2 million downloads
Gemma 4's 2M-download debut signals market acceleration toward on-device inference and local-first open models over centralized cloud APIs.
Welcome Gemma 4: Frontier multimodal intelligence on device
Google releases Gemma 4, an open-source multimodal family (2B–27B parameters) scoring at the performance frontier while optimized for on-device deployment without fine-tuning needed.