llama.cpp
12 mentions across all digests
llama.cpp is an open-source inference runtime that enables efficient local execution of large language models, with day-0 support for models like Google's Gemma 4 alongside vLLM and Ollama.
Universal Claude.md – cut Claude output tokens by 63%
A configurable CLAUDE.md template cuts Claude output tokens by 63% via behavioral optimization, reducing API costs in automation pipelines without code changes.
[AINews] The Last 4 Jobs in Tech
Anthropic's Claude Code adds computer use capability, enabling closed-loop verification (code → run → inspect UI → fix). The article emphasizes that harness quality, tooling, and orchestration now create larger practi...
My first impressions on ROCm and Strix Halo
Developer validates AMD's Strix Halo APU as a viable platform for local LLM inference, successfully running Qwen 3.6 efficiently via ROCm and llama.cpp on Ubuntu.
Stop Using Ollama
Ollama, the dominant local LLM platform, systematically violated MIT licensing, abandoned open-source principles for VC funding, and degraded performance — with llama.cpp achieving 1.8× faster benchmarks after forking away.
The M×N problem of tool calling and open-source models
Each of M open-source inference frameworks (vLLM, SGLang, TensorRT-LLM) must independently reverse-engineer and maintain tool-calling parsers for N incompatible model formats, creating unsustainable M×N maintenance burden that standardized declarative specs could eliminate.