Glossary
262 entities tracked across all stories
OpenAI
OpenAI is an AI research company founded by Elon Musk and Sam Altman that develops and commercializes large language models and AI systems, including GPT models and APIs for developers building AI applications.
- Introducing GPT-5.4 in Microsoft Foundry
- Thoughts on slowing the fuck down
- Claude Code can now take over your computer to complete tasks
- Last Week in AI #339 - DLSS 5, OpenAI Superapp, MiniMax M2.7
- Ilya Sutskever — We're moving from the age of scaling to the age of research
Anthropic
Anthropic is an AI safety company founded by Dario Amodei that develops Claude, a large language model used for a range of applications including agentic coding tasks, which has captured an estimated 40% of enterprise AI market share.
- Show HN: A plain-text cognitive architecture for Claude Code
- [AINews] The Biggest Claude Launch of All Time
- OpenClaw Agents Can Be Guilt-Tripped Into Self-Sabotage
- Pentagon’s ‘Attempt to Cripple’ Anthropic Is Troubling, Judge Says
- Anthropic’s Claude can now control your Mac, escalating the fight to build AI agents that actually do work
Claude Code
Claude Code is an AI-powered coding agent developed by Anthropic that enables developers to perform complex software tasks directly within their editor, including orchestrating machine learning workflows like model fine-tuning through integration with external services like Hugging Face.
- Show HN: A plain-text cognitive architecture for Claude Code
- 90% of Claude-linked output going to GitHub repos w <2 stars
- AI supply chain attacks don’t even require malware…just post poisoned documentation
- Building a coding agent in Swift from scratch
- Claude Code can now take over your computer to complete tasks
Claude
Claude is an AI assistant developed by Anthropic that specializes in coding, reasoning, and agentic tasks, competing with models like GPT-4 and Google's Gemini in the large language model space.
- OpenClaw Agents Can Be Guilt-Tripped Into Self-Sabotage
- Anthropic’s Claude can now control your Mac, escalating the fight to build AI agents that actually do work
- Building a coding agent in Swift from scratch
- Claude Can Now Take Control of Your Mac
- Show HN: Cq – Stack Overflow for AI coding agents
Hugging Face
Hugging Face is an AI platform that hosts machine learning models, datasets, and applications, serving over 13 million users and maintaining more than 2 million public models and 500,000 public datasets as of Spring 2026.
- State of Open Source on Hugging Face: Spring 2026
- Holotron-12B - High Throughput Computer Use Agent
- GGML and llama.cpp join HF to ensure the long-term progress of Local AI
- Train AI models with Unsloth and Hugging Face Jobs for FREE
- Custom Kernels for All from Codex and Claude
Google is a multinational technology company headquartered in Mountain View, California, known for its search engine, advertising platform, and artificial intelligence products including the Gemini family of large language models and generative AI tools.
- [AINews] Anthropic @ $19B ARR, Qwen team leaves, Gemini and GPT bump up fast models
- Welcome EmbeddingGemma, Google's new efficient embedding model
- Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM
- The latest AI news we announced in February
- Start building with Gemini 3
Codex
Codex is OpenAI's AI coding model that interprets natural language commands and generates code to power AI-assisted software development applications and workflows.
- GPT 5.4 is a big step for Codex
- Agents Over Bubbles
- [AINews] GPT 5.4: SOTA Knowledge Work -and- Coding -and- CUA Model, OpenAI is so very back
- Train AI models with Unsloth and Hugging Face Jobs for FREE
- Codex is Open Sourcing AI models
Simon Willison
Simon Willison is a developer and technical writer known for documenting AI-assisted development practices, including experiments with Claude AI coding tools, JavaScript sandboxing research, and Git workflows for working with coding agents.
- Thoughts on slowing the fuck down
- Streaming experts
- Experimenting with Starlette 1.0 with Claude skills
- JavaScript Sandboxing Research
- Using Git with coding agents
ChatGPT
ChatGPT is a conversational AI assistant developed by OpenAI that uses large language models to generate human-like responses and is available as a web interface, mobile app, and API with over 900 million weekly active users.
- Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)
- Show HN: Cq – Stack Overflow for AI coding agents
- AI agents are 'gullible' and easy to turn into your minions
- Last Week in AI #339 - DLSS 5, OpenAI Superapp, MiniMax M2.7
- Agents Over Bubbles
GPT-5
GPT-5 is a large language model developed by OpenAI that demonstrates advanced capabilities in scientific reasoning, mathematics, and code generation, with variants like GPT-5.3-Codex-Spark optimized for specific domains.
- Introducing GPT-5.3-Codex-Spark
- Inside OpenAI’s in-house data agent
- Evaluating AI’s ability to perform scientific research tasks
- GPT-5 and the future of mathematical discovery
- Early experiments in accelerating science with GPT-5
Cursor
Cursor is an AI-powered code editor that uses large language models to assist with software development tasks, competing with Claude Code and other agentic IDEs.
- Thoughts on slowing the fuck down
- LiteLLM Compromised by Credential Stealer
- AI agents are 'gullible' and easy to turn into your minions
- [AINews] The high-return activity of raising your aspirations for LLMs
- Cursor's Third Era: Cloud Agents
NVIDIA
NVIDIA is a technology company founded by Jensen Huang that designs and manufactures GPUs and AI accelerators used for computing, machine learning, and data center inference workloads, and also develops software platforms like CUDA.
- Olmo Hybrid and future LLM architectures
- Latest open artifacts (#17): NVIDIA, Arcee, Minimax, DeepSeek, Z.ai and others close an eventful year on a high note
- An Interview with Nvidia CEO Jensen Huang About Accelerated Computing
- [AINews] NVIDIA GTC: Jensen goes hard on OpenClaw, Vera CPU, and announces $1T sales backlog in 2027
- NVIDIA's AI Engineers: Agent Inference at Planetary Scale and "Speed of Light" — Nader Khalil (Brev), Kyle Kranen (Dynamo)
Meta
Meta is a technology company headquartered in Menlo Park, California, run by founder Mark Zuckerberg, known for the social media platforms Facebook and Instagram, and increasingly for developing and open-sourcing large language models and AI systems, particularly the Llama family of models.
- Meta Superintelligence – Leadership Compute, Talent, and Data
- Building the Open Agent Ecosystem Together: Introducing OpenEnv
- Welcome Llama 4 Maverick & Scout on Hugging Face
- Llama can now see and run on your device - welcome Llama 3.2
- Llama 3.1 - 405B, 70B & 8B with multilinguality and long context
Google DeepMind
Google DeepMind is an AI research division of Google that develops large language models like the Gemini family, which achieve state-of-the-art performance on reasoning, programming, and mathematical benchmarks, and deploys AI agents for tasks such as code security and computer control.
- Gemini 3 Deep Think: Advancing science, research and engineering
- Gemini 3 Flash: frontier intelligence built for speed
- Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad
- Gemini achieves gold-medal level at the International Collegiate Programming Contest World Finals
- Introducing CodeMender: an AI agent for code security
HuggingFace
HuggingFace is an open-source AI platform and model repository that provides tools, libraries, and pre-trained transformer models for building and deploying machine learning applications, and serves as a hub for distributing and collaborating on AI models and datasets.
- Mixture of Experts (MoEs) in Transformers
- We Got Claude to Build CUDA Kernels and teach open models!
- One Year Since the “DeepSeek Moment”
- Smol2Operator: Post-Training GUI Agents for Computer Use
- Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers
Microsoft
Microsoft is a multinational technology company headquartered in Redmond, Washington, founded by Bill Gates and Paul Allen in 1975, known for its software products including Windows and Office, cloud computing services through Azure, and recent investments in artificial intelligence including partnerships with OpenAI and development of AI infrastructure and models.
- Introducing GPT-5.4 in Microsoft Foundry
- Thoughts on slowing the fuck down
- Announcing TypeScript 6.0
- Satya Nadella — How Microsoft is preparing for AGI
- Differential Transformer V2
LLM
An LLM (Large Language Model) is a neural network trained on vast amounts of text data to predict and generate human language, with capabilities that improve predictably as model size and training compute scale up.
- Andrej Karpathy — AGI is still a decade away
- Personal Copilot: Train Your Own Coding Assistant
- Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU
- Improving instruction hierarchy in frontier LLMs
- OpenAI to acquire Promptfoo
prompt injection
Prompt injection is a security attack where an attacker embeds malicious instructions within user input or external content to manipulate an AI model's behavior, bypassing its original instructions through social engineering or string-based overrides, with defenses including instruction hierarchy training and system-level constraints.
- AI supply chain attacks don’t even require malware…just post poisoned documentation
- "Disregard that!" attacks
- Claude Can Now Take Control of Your Mac
- AI agents are 'gullible' and easy to turn into your minions
- Designing AI agents to resist prompt injection
GPT-5.2
GPT-5.2 is an OpenAI large language model known for demonstrating reasoning capability in theoretical physics and serving as the foundation for specialized variants including GPT-5.2-Codex, which is optimized for agentic coding tasks.
- GPT-5.2 derives a new result in theoretical physics
- GPT-5.3-Codex System Card
- Introducing GPT-5.3-Codex
- Addendum to GPT-5.2 System Card: GPT-5.2-Codex
- Introducing GPT-5.2-Codex
OpenClaw
OpenClaw is an agent framework benchmark and agentic AI platform used to evaluate and develop multi-agent AI systems, known for establishing performance standards in the agentic coding and knowledge work automation space.
- Thoughts on slowing the fuck down
- OpenClaw Agents Can Be Guilt-Tripped Into Self-Sabotage
- Claude Code can now take over your computer to complete tasks
- [AINews] MiniMax 2.7: GLM-5 at 1/3 cost SOTA Open Model
- [AINews] Claude Cowork Dispatch: Anthropic's Answer to OpenClaw
GitHub
GitHub is a web-based platform owned by Microsoft that provides version control, code hosting, and collaboration tools for software development, and is the home of the Copilot AI coding assistant.
- 90% of Claude-linked output going to GitHub repos w <2 stars
- How I'm Productive with Claude Code
- Trace any Copilot coding agent commit to its session logs
- Copilot coding agent now starts work 50% faster
- Introducing upgrades to Codex
AGI
AGI (Artificial General Intelligence) is a theoretical AI system capable of understanding and performing intellectual tasks across any domain at a human or superhuman level, without domain-specific training.
- Arm Is Now Making Its Own Chips
- ARC-AGI-3
- Dario Amodei — "We are near the end of the exponential"
- Thoughts on AI progress (Dec 2025)
- Thoughts on AI progress (Dec 2025)
DeepSeek
DeepSeek is a Chinese AI company known for developing efficient open-source large language models and reasoning models, particularly the DeepSeek R1, which achieved competitive performance with proprietary models while using novel architectural innovations to reduce computational costs.
- Latest open artifacts (#17): NVIDIA, Arcee, Minimax, DeepSeek, Z.ai and others close an eventful year on a high note
- 2025 Open Models Year in Review
- Meta Superintelligence – Leadership Compute, Talent, and Data
- One Year Since the “DeepSeek Moment”
- On the Shifting Global Compute Landscape
MCP
MCP (Model Context Protocol) is an open standard that enables AI agents and language models to dynamically discover and invoke tools from servers, facilitating integration of external capabilities into AI applications and agentic workflows.
- AI supply chain attacks don’t even require malware…just post poisoned documentation
- Claude Code Cheat Sheet
- Generate Images with Claude and Hugging Face
- Tiny Agents in Python: a MCP-powered agent in ~70 lines of code
- Tiny Agents: an MCP-powered agent in 50 lines of code
AWS
AWS (Amazon Web Services) is a cloud computing platform run by Amazon that provides on-demand infrastructure services including compute, storage, and AI/ML tools, and hosts major AI companies like Anthropic and OpenAI.
- Thoughts on slowing the fuck down
- Amazon’s AI Resurgence: AWS & Anthropic’s Multi-Gigawatt Trainium Expansion
- Introducing the Stateful Runtime Environment for Agents in Amazon Bedrock
- AWS and OpenAI announce multi-year strategic partnership
- OpenAI's smart speaker 📢, Apple visual intelligence 👀, Code Mode 🧑💻
GPT-4
GPT-4 is a large language model developed by OpenAI that can process and generate text, with multimodal capabilities to handle images, and is known for advanced reasoning and natural language understanding tasks.
- A new era of intelligence with Gemini 3
- Evaluating AI’s ability to perform scientific research tasks
- First look at GPT-5
- Introducing GPT-4.5
- Finding GPT-4’s mistakes with GPT-4
Gemini
Gemini is a large language model developed by Google DeepMind known for advanced reasoning capabilities, particularly through its "Deep Think" feature that enables extended chain-of-thought problem-solving, as demonstrated by gold-medal performance at the International Mathematical Olympiad and International Collegiate Programming Contest.
- Show HN: Cq – Stack Overflow for AI coding agents
- Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad
- Gemini achieves gold-medal level at the International Collegiate Programming Contest World Finals
- AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms
- Gemini 2.5: Our most intelligent AI model
Andrej Karpathy
Andrej Karpathy is an AI researcher known for developing the "autoresearch" framework that enables autonomous AI agents to run experiments and improve large language models, and for his recent work on agentic AI systems and predictions about AGI timelines.
- Delve did the security compliance on LiteLLM, an AI project hit by malware
- Thoughts on slowing the fuck down
- Autoresearch on an old research idea
- Andrej Karpathy — AGI is still a decade away
- [AINews] Autoresearch: Sparks of Recursive Self Improvement
GPT-5.4
GPT-5.4 is OpenAI's frontier language model that combines reasoning, coding, and agentic capabilities with native computer-use support, a 1M token context window, and a "Thinking" mode for visible planning steps, available through ChatGPT, the API, and Codex.
- Introducing GPT-5.4 in Microsoft Foundry
- [AINews] Autoresearch: Sparks of Recursive Self Improvement
- [AINews] GPT 5.4: SOTA Knowledge Work -and- Coding -and- CUA Model, OpenAI is so very back
- [AINews] Is Harness Engineering real?
- Introducing GPT-5.4 mini and nano
Claude Opus 4.6
Claude Opus 4.6 is a large language model developed by Anthropic that excels at agentic tasks and long-horizon reasoning, recently achieving benchmarks for autonomous AI agents working on multi-hour engineering tasks.
- Slopification and its Discontents
- Epoch confirms GPT5.4 Pro solved a frontier math open problem
- Opus 4.6, Codex 5.3, and the post-benchmark era
- [AINews] Is Harness Engineering real?
- Last Week in AI #335 - Opus 4.6, Codex 5.3, Gemini 3 Deep Think, GLM 5, Seedance 2.0
Apple
Apple Inc. is a technology company founded by Steve Jobs, Steve Wozniak, and Ronald Wayne that designs and manufactures consumer electronics including iPhones, iPads, Macs, and related services, and is headquartered in Cupertino, California.
- iPhone 17 Pro Demonstrated Running a 400B LLM
- Sam & Jony
- Last Week in AI #332 - Apple + Gemini, OpenAI + Cerebras, Claude Cowork
- Autoresearching Apple's "LLM in a Flash" to run Qwen 397B locally
- Coding After Coders: The End of Computer Programming as We Know It
System Card
A System Card is a technical documentation document published by OpenAI that discloses a model's safety evaluations, capability assessments, performance benchmarks, deployment constraints, and known failure modes.
- GPT-5.3-Codex System Card
- Addendum to GPT-5.2 System Card: GPT-5.2-Codex
- GPT-5.1-Codex-Max System Card
- GPT-5 System Card
- ChatGPT agent System Card
GPT-4o
GPT-4o is a multimodal AI model developed by OpenAI that processes text, images, and other inputs to generate responses, and is used as a production AI API available to developers.
- We’re expanding our Gemini 2.5 family of models
- OpenAI and Anthropic share findings from a joint safety evaluation
- GPT-5 and the new era of work
- New tools and features in the Responses API
- Sycophancy in GPT-4o: what happened and what we’re doing about it
Gemini 3.1 Pro
Gemini 3.1 Pro is Google's AI model released in February that offers approximately 2x reasoning performance improvement over Gemini 3 Pro and is designed for complex problem-solving tasks, available to developers through API access and consumer platforms.
- Epoch confirms GPT5.4 Pro solved a frontier math open problem
- The latest AI news we announced in February
- LWiAI Podcast #235 - Sonnet 4.6, Deep-thinking tokens, Anthropic vs Pentagon
- Last Week in AI #336 - Sonnet 4.6, Gemini 3.1 Pro, Anthropic vs Pentagon
- The Sashiko patch-review system
OpenAI Codex
OpenAI Codex is an AI coding agent developed by OpenAI that assists developers with code generation and software engineering tasks, having reached over 2 million weekly active users as of 2026.
- [AINews] NVIDIA GTC: Jensen goes hard on OpenClaw, Vera CPU, and announces $1T sales backlog in 2027
- Custom Kernels for All from Codex and Claude
- OpenAI Codex and Figma launch seamless code-to-design experience
- Use subagents and custom agents in Codex
- Coding agents for data analysis
o1
o1 is OpenAI's reasoning model that uses extended chain-of-thought processing to solve complex problems in coding, mathematics, and science by working through problems step-by-step before generating responses.
- Agents Over Bubbles
- Introducing OpenAI o3 and o4-mini
- Detecting misbehavior in frontier reasoning models
- OpenAI o1 and new tools for developers
- OpenAI o1 System Card
coding agents
Coding agents are autonomous AI systems trained to write, review, and modify software code, often with access to version control systems, documentation, and development tools, used by engineers to accelerate development workflows while requiring monitoring for behavioral misalignment and safety.
- Using Git with coding agents
- How we monitor internal coding agents for misalignment
- GPT-5 bio bug bounty call
- Agent bio bug bounty call
- Judgment and creativity are all you need
Amazon
Amazon is a multinational technology and e-commerce company headquartered in Seattle that operates cloud computing services (AWS), runs an e-commerce marketplace, and invests in AI infrastructure and partnerships including a recent $50 billion investment in OpenAI.
- Training code generation models to debug their own outputs
- Scaling AI for everyone
- OpenAI and Amazon announce strategic partnership
- Coding After Coders: The End of Computer Programming as We Know It
- Amazon vs USPS 📪, Google's vibe designer 🧑🎨, how Codex works 🤖
xAI
xAI is an AI company founded by Elon Musk that develops AI models and coding assistants, including the Grok model, and is working on integration projects with Tesla.
- Evolving OpenAI’s structure
- LWiAI Podcast #237 - Nemotron 3 Super, xAI reborn, Anthropic Lawsuit, Research!
- Last Week in AI #338 - Anthropic sues Trump, xAI starting over, Iran AI Fakes
- LWiAI Podcast #231 - Claude Cowork, Anthropic $10B, Deep Delta Learning
- Meta 20% layoffs 💼, Musk rebuilds xAI 🧠, Travis Kalanick's robots 🤖
o3
o3 is an AI reasoning model developed by OpenAI that combines advanced reasoning capabilities with integrated tool use including web browsing, Python execution, and image analysis, and is designed for complex problem-solving tasks like code review and multi-step reasoning.
- OpenAI and Anthropic share findings from a joint safety evaluation
- Shipping code faster with o3, o4-mini, and GPT-4.1
- New tools and features in the Responses API
- Addendum to o3 and o4-mini system card: Codex
- OpenAI o3 and o4-mini System Card
GPT
GPT is a series of large language models developed by OpenAI that can generate human-like text, answer questions, and support function calling to return structured outputs for tool-use applications.
- [AINews] Anthropic @ $19B ARR, Qwen team leaves, Gemini and GPT bump up fast models
- Mixture of Experts (MoEs) in Transformers
- Introducing gpt-oss
- Introducing OpenAI o1
- Function calling and other API updates
DeepMind
DeepMind is an AI research laboratory owned by Google that develops reinforcement learning systems and advanced AI models, known for creating AlphaGo, AlphaFold, and AlphaProof, and for powering agentic AI systems like AlphaEvolve.
- From games to biology and beyond: 10 years of AlphaGo’s impact
- Start building with Gemini 3
- A new era of intelligence with Gemini 3
- AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms
- Introducing Gemini 2.5 Flash
reinforcement learning
Reinforcement learning is a machine learning approach where an AI agent learns to make decisions by receiving rewards or penalties for its actions, enabling it to optimize behavior through trial and error, as demonstrated by DeepMind's AlphaGo system that mastered the game of Go.
- Training code generation models to debug their own outputs
- Thoughts on AI progress (Dec 2025)
- RL is even more information inefficient than you thought
- Andrej Karpathy — AGI is still a decade away
- From games to biology and beyond: 10 years of AlphaGo’s impact
Gemini 2.5 Pro
Gemini 2.5 Pro is Google DeepMind's AI model family available in stable and experimental versions, designed for production use with enhanced reasoning capabilities, coding optimization, and support for extended context windows and multi-modal agent workflows.
- We’re expanding our Gemini 2.5 family of models
- Gemini 2.5: Updates to our family of thinking models
- Our vision for building a universal AI assistant
- Build rich, interactive web apps with an updated Gemini 2.5 Pro
- Gemini 2.5: Our most intelligent AI model
Responses API
Responses API is an open-source standard led by Hugging Face that standardizes response formats for agentic AI workloads, supporting tool use, multi-step reasoning, and stateful loops as a community-owned alternative to OpenAI's Chat Completions API.
- Open Responses: What you need to know
- From model to agent: Equipping the Responses API with a computer environment
- Introducing AgentKit, new Evals, and RFT for agents
- Introducing upgrades to Codex
- New tools and features in the Responses API
agentic coding
Agentic coding is a development approach where AI agents autonomously write, debug, and test code with minimal or no human intervention, using large language models to reason about programming tasks and generate working implementations across multiple programming languages.
- Does Computer Science still exist?
- [AINews] Every Lab serious enough about Developers has bought their own Devtools
- VibeGame: Exploring Vibe Coding Games
- Making a web app generator with open ML models
- Cisco and OpenAI redefine enterprise engineering with AI agents
Astral
Astral is a Python developer tooling company best known for creating uv (a package manager), Ruff (a linter), and Ty, and was acquired by OpenAI in 2025 to integrate its tools into OpenAI's Codex coding agent.
- [AINews] Every Lab serious enough about Developers has bought their own Devtools
- OpenAI to acquire Astral
- Thoughts on OpenAI acquiring Astral and uv/ruff/ty
- OpenAI tries to build its coding cred, acquires Python toolmaker Astral
- OpenAI is acquiring open source Python tool-maker Astral
Pentagon
The Pentagon is the United States Department of Defense headquarters that oversees military operations, defense policy, and procurement contracts for the armed forces.
- Last Week in AI #336 - Sonnet 4.6, Gemini 3.1 Pro, Anthropic vs Pentagon
- Anthropic sues Pentagon ⚖️, Siri delays Apple products 🖥️, Claude Code Review 👨💻
- GPT-5.4 🤖, Anthropic's leaked memo 📝, Claude Code auto mode 🧑💻
- Anthropic vs Pentagon 🤖, SpaceX eyes March IPO 💰, lessons building Claude Code 🧑💻
- Meta's $100B deal 💰, Pentagon threatens Anthropic 🏛️, chinese vibe coders 🧑💻
Transformers
Transformers is a neural network architecture that uses attention mechanisms to process sequential data in parallel, widely used as the foundation for large language models and developed at Google (originally introduced in the 2017 "Attention Is All You Need" paper).
- Mixture of Experts (MoEs) in Transformers
- GGML and llama.cpp join HF to ensure the long-term progress of Local AI
- Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers
- Tool Use, Unified
- Our vision for building a universal AI assistant
Bun
Bun is a JavaScript runtime and toolkit that serves as an alternative to Node.js, acquired by Anthropic in 2024 as part of the AI lab's strategy to own developer infrastructure.
- [AINews] Every Lab serious enough about Developers has bought their own Devtools
- Transformers.js v4 Preview: Now Available on NPM!
- Thoughts on OpenAI acquiring Astral and uv/ruff/ty
- OpenAI tries to build its coding cred, acquires Python toolmaker Astral
- OpenAI is acquiring open source Python tool-maker Astral
Model Context Protocol
Model Context Protocol is an open standard developed by Anthropic that enables AI agents and language models to discover, connect to, and invoke external tools and services dynamically.
- LiteLLM Compromised by Credential Stealer
- [AINews] The high-return activity of raising your aspirations for LLMs
- Generate Images with Claude and Hugging Face
- Tiny Agents in Python: a MCP-powered agent in ~70 lines of code
- Tiny Agents: an MCP-powered agent in 50 lines of code
Python
Python is a high-level, general-purpose programming language known for its simple syntax and wide adoption in web development, data science, and artificial intelligence applications.
- Experimenting with Starlette 1.0 with Claude skills
- Tiny Agents in Python: a MCP-powered agent in ~70 lines of code
- OpenAI to acquire Astral
- Coding agents for data analysis
- ‘Software Bonkers’
Stripe
Stripe is a financial services company that provides payment processing software and APIs, allowing businesses and developers to accept payments online.
- Mozilla dev's "Stack Overflow for agents" targets a key weakness in coding AI
- Dreamer: the Personal Agent OS — David Singleton
- Perhaps not Boring Technology after all
- OpenAI's pivot 🤖, Nvidia space GPUs 🛰️, Stripe's internal dev agent 👨💻
- OpenAI's smart speaker 📢, Apple visual intelligence 👀, Code Mode 🧑💻
uv
uv is a modern Python package manager and project management tool developed by Astral (acquired by OpenAI in 2025) that achieves approximately 126 million downloads per month and is widely used by Python developers for dependency resolution and project workflows.
- [AINews] Every Lab serious enough about Developers has bought their own Devtools
- OpenAI to acquire Astral
- Thoughts on OpenAI acquiring Astral and uv/ruff/ty
- OpenAI tries to build its coding cred, acquires Python toolmaker Astral
- OpenAI is acquiring open source Python tool-maker Astral
ruff
Ruff is a fast Python linter and code formatter developed by Astral, a Python developer tooling company acquired by OpenAI in 2025.
- [AINews] Every Lab serious enough about Developers has bought their own Devtools
- OpenAI to acquire Astral
- Thoughts on OpenAI acquiring Astral and uv/ruff/ty
- OpenAI tries to build its coding cred, acquires Python toolmaker Astral
- OpenAI is acquiring open source Python tool-maker Astral
GPT-5.3-Codex
GPT-5.3-Codex is OpenAI's agentic coding model that combines frontier coding capabilities with reasoning abilities, running 25% faster than its predecessor and achieving state-of-the-art performance on software engineering benchmarks like SWE-Bench Pro.
- Opus 4.6, Codex 5.3, and the post-benchmark era
- [AINews] GPT 5.4: SOTA Knowledge Work -and- Coding -and- CUA Model, OpenAI is so very back
- GPT-5.3-Codex System Card
- Introducing GPT-5.3-Codex
- LWiAI Podcast #234 - Opus 4.6, GPT-5.3-Codex, Seedance 2.0, GLM-5
Ars Technica
Ars Technica is a technology news and analysis publication that covers topics including artificial intelligence, space exploration, cybersecurity, and other tech industry developments.
- Ars Technica Fires Reporter Benj Edwards After He Published Story With AI-Fabricated Quotes
- Kagi Translate's AI answers the question "What would horny Margaret Thatcher say?"
- Musk’s tactic of blaming users for Grok sex images may be foiled by EU law
- A private space company has a radical new plan to bag an asteroid
- Federal cyber experts called Microsoft's cloud a "pile of shit," approved it anyway
Cloudflare
Cloudflare is a web infrastructure and security company that provides content delivery, DDoS protection, DNS, and API services to websites and applications, operating a global network of data centers.
- Cloudflare’s new Dynamic Workers ditch containers to run AI agent code 100x faster
- Meta buys Moltbook 🦞, YouTube passes Disney 📈, Cloudflare crawling 👨💻
- Jane Street vs Bitcoin 🪙, AGI career decisions 💼, Vercel Chat SDK 🤖
- OpenAI's smart speaker 📢, Apple visual intelligence 👀, Code Mode 🧑💻
- Cloudflare appeals Piracy Shield fine, hopes to kill Italy's site-blocking law
o4-mini
o4-mini is an OpenAI reasoning model designed for cost and latency-efficient tasks, capable of tool use within chain-of-thought reasoning and used in applications like code review and autonomous coding agents.
- Shipping code faster with o3, o4-mini, and GPT-4.1
- New tools and features in the Responses API
- Addendum to o3 and o4-mini system card: Codex
- OpenAI o3 and o4-mini System Card
- Introducing OpenAI o3 and o4-mini
RLHF
RLHF (Reinforcement Learning from Human Feedback) is a training technique that uses human evaluations of AI model outputs to fine-tune and improve model behavior through reinforcement learning.
- Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU
- Sycophancy in GPT-4o: what happened and what we’re doing about it
- Finding GPT-4’s mistakes with GPT-4
- Weak-to-strong generalization
agentic workflows
Agentic workflows are AI-powered processes where autonomous agents perform sequential tasks or decision-making steps, commonly used in applications like AI-powered development tools and agent frameworks that require fast reasoning and low latency.
- Gemini 3 Flash: frontier intelligence built for speed
- From model to agent: Equipping the Responses API with a computer environment
- Introducing GPT-5
- How to make sense of AI
chain-of-thought reasoning
Chain-of-thought reasoning is a technique where AI models work through problem-solving steps sequentially before providing a final answer, improving accuracy on complex tasks.
- Gemini 2.5: Our most intelligent AI model
- GPT-5.4 Thinking System Card
- Thinking with images
- OpenAI o1 System Card
Claude Opus 4.5
Claude Opus 4.5 is an AI language model developed by Anthropic that is designed for agentic task completion, enabling autonomous code generation and agent orchestration through integrated tools and reasoning capabilities.
- Agents Over Bubbles
- We Got Claude to Build CUDA Kernels and teach open models!
- Import AI 443: Into the mist: Moltbook, agent ecologies, and the internet in transition
- Import AI 439: AI kernels; decentralized training; and universal representations
RAG
RAG (Retrieval-Augmented Generation) is a technique that augments large language models with external retrieval systems to fetch relevant documents or data before generating responses, improving accuracy and enabling access to up-to-date information beyond a model's training data.
- Mozilla dev's "Stack Overflow for agents" targets a key weakness in coding AI
- Retrieval After RAG: Hybrid Search, Agents, and Database Design — Simon Hørup Eskildsen of Turbopuffer
- Sentence Transformers is joining Hugging Face!
- Welcome EmbeddingGemma, Google's new efficient embedding model
large language models
Large language models are neural networks trained on vast amounts of text data that predict and generate human language by processing sequential tokens, with examples including Meta's Llama and OpenAI's GPT series that are widely used for AI-powered applications.
- Welcome Llama 3 - Meta's new open LLM
- The Hallucinations Leaderboard, an Open Effort to Measure Hallucinations in Large Language Models
- Why language models hallucinate
- Extracting Concepts from GPT-4
GPT-5.4 mini
GPT-5.4 mini is OpenAI's efficient small language model designed for high-volume, latency-sensitive workloads including coding assistants, subagents, and multimodal applications, running 2x faster than GPT-5 mini while approaching GPT-5.4 performance on benchmarks.
- [AINews] Claude Cowork Dispatch: Anthropic's Answer to OpenClaw
- Introducing GPT-5.4 mini and nano
- GPT-5.4 mini and GPT-5.4 nano, which can describe 76,000 photos for $52
- Amazon 1hr delivery 🚚, GPT-5.4 mini 🤖, how Anthropic uses Skills 👨💻
GPT-5.4 nano
GPT-5.4 nano is OpenAI's smallest and lowest-cost large language model from the GPT-5.4 family, priced at $0.20/$1.25 per million tokens, designed for high-volume latency-sensitive workloads including coding assistants, subagents, and multimodal applications with a 400k context window.
- [AINews] Claude Cowork Dispatch: Anthropic's Answer to OpenClaw
- Introducing GPT-5.4 mini and nano
- GPT-5.4 mini and GPT-5.4 nano, which can describe 76,000 photos for $52
- Amazon 1hr delivery 🚚, GPT-5.4 mini 🤖, how Anthropic uses Skills 👨💻
autoresearch
Autoresearch is an agentic coding pattern where an AI system autonomously designs and runs numerous experiments to optimize software performance or train machine learning models, with notable implementations by Andrej Karpathy (improving LLM training efficiency) and Shopify (accelerating their Liquid template engine).
- Autoresearch on an old research idea
- [AINews] Autoresearch: Sparks of Recursive Self Improvement
- Shopify/liquid: Performance: 53% faster parse+render, 61% fewer allocations
- Scaling Karpathy's Autoresearch: What Happens When the Agent Gets a GPU Cluster
Opus 4.6
Opus 4.6 is Anthropic's large language model featuring a 1M-token context window, designed for multi-agent tasks and autonomous agent systems, and notable for reliably executing extended agent loops.
- [AINews] Autoresearch: Sparks of Recursive Self Improvement
- LWiAI Podcast #234 - Opus 4.6, GPT-5.3-Codex, Seedance 2.0, GLM-5
- ImportAI 449: LLMs training other LLMs; 72B distributed training run; computer vision is harder than generative text
- 1M context is now generally available for Opus 4.6 and Sonnet 4.6
Groq
Groq was an AI inference chip startup that developed specialized processors (LPUs) optimized for fast token generation, which Nvidia acquired for approximately $20 billion in 2025.
- An Interview with Nvidia CEO Jensen Huang About Accelerated Computing
- LWiAI Podcast #230 - 2025 Retrospective, Nvidia buys Groq, GLM 4.7, METR
- Last Week in AI #330 - Groq->Nvidia , ChatGPT Apps, US AI Genesis Mission
- Decoding Nvidia's Groq-powered LPX and the rest of its new rack systems
Opus 4.5
Opus 4.5 is a large language model developed by Anthropic that powers Claude Code and is widely used for agentic AI tasks, complex problem-solving, and coding assistance, having been released in November 2025.
- Get Good at Agents
- Claude Code Hits Different
- Import AI 442: Winners and losers in the AI economy; math proof automation; and industrialization of cyber espionage
- Import AI 438: Silent sirens, flashing for us all
agentic AI
Agentic AI refers to autonomous artificial intelligence systems that can independently plan and execute multi-step tasks toward defined goals, rather than simply responding to individual user queries, with examples including OpenAI's ChatGPT agent and Anthropic's Claude Code that can take actions like browsing, coding, and research without requiring user direction for each step.
- Get Good at Agents
- Introducing ChatGPT agent
- New funding to build towards AGI
- Deep research System Card
Dario Amodei
Dario Amodei is the CEO of Anthropic, an AI safety company that develops the Claude language model, and is known for advocating the "Big Blob of Compute Hypothesis" regarding AI scaling laws and making predictions about AI's economic impact.
- Dario Amodei — "We are near the end of the exponential"
- GPT-5.4 🤖, Anthropic's leaked memo 📝, Claude Code auto mode 🧑💻
- Nano Banana 2 🍌, Netflix loses WB bid 🎬, Block's AI layoff 💼
- Claim Chowder: Anthropic CEO Dario Amodei on the Percentage of Code Being Generated by AI Today
Grok
Grok is an AI chatbot developed by xAI (Elon Musk's company) that generates text responses to user prompts, known for occasionally producing inaccurate information and generating explicit images from certain prompts.
- Last Week in AI #331 - Nvidia announcements, Grok bikini prompts, RAISE Act
- iPhone Fold layout 📱, xAI + Tesla 🤖, fixing Javascript time 🧑💻
- Water company wasted $200k on bad answers from an AI model – so built its own slop filtering system
- Musk’s tactic of blaming users for Grok sex images may be foiled by EU law
Daring Fireball
Daring Fireball is a technology blog run by John Gruber that focuses on Apple, Mac software, web design, and tech industry commentary, known for its critical analysis and use of formats like "claim chowder" to document bold predictions for future accountability.
- Ars Technica Fires Reporter Benj Edwards After He Published Story With AI-Fabricated Quotes
- NYT: ‘Meta Delays Rollout of New AI Model After Performance Concerns’
- Claim Chowder: Anthropic CEO Dario Amodei on the Percentage of Code Being Generated by AI Today
- ‘Grief and the AI Split’
Moltbook
Moltbook is an AI agent social network built on OpenClaw that was acquired by Meta and represents a large-scale agent ecology with tens of thousands of AI agents.
- OpenClaw Agents Can Be Guilt-Tripped Into Self-Sabotage
- Last Week in AI #334 - Kimi K2.5 & Code, Genie 3, OpenClaw & Moltbook
- Import AI 443: Into the mist: Moltbook, agent ecologies, and the internet in transition
- Meta buys Moltbook 🦞, YouTube passes Disney 📈, Cloudflare crawling 👨💻
AI agents
AI agents are autonomous software systems that use large language models to perceive their environment, make decisions, and take actions toward defined goals, often by interacting with external tools or data sources while managing memory and handling security challenges like prompt injection attacks.
- Anthropic’s Claude can now control your Mac, escalating the fight to build AI agents that actually do work
- "Disregard that!" attacks
- Designing AI agents to resist prompt injection
- How do you want to remember?
SWE-bench
SWE-bench is a benchmark that evaluates the capability of AI models and autonomous agents to resolve software engineering tasks, particularly bug fixes and code modifications on real-world GitHub repositories.
- Why we no longer evaluate SWE-bench Verified
- Introducing the SWE-Lancer benchmark
- LWiAI Podcast #237 - Nemotron 3 Super, xAI reborn, Anthropic Lawsuit, Research!
- Last Week in AI #336 - Sonnet 4.6, Gemini 3.1 Pro, Anthropic vs Pentagon
GPT-5.2-Codex
GPT-5.2-Codex is an agentic coding model developed by OpenAI, optimized for long-horizon coding tasks such as large refactors and migrations, with enhanced Windows support and cybersecurity capabilities.
- Introducing GPT-5.3-Codex
- Introducing GPT-5.2-Codex
- Addendum to GPT-5.2 System Card: GPT-5.2-Codex
- Introducing GPT-5.2-Codex
chain-of-thought
Chain-of-thought is a technique that prompts AI models to break down complex reasoning tasks by generating intermediate steps or explanations before producing a final answer, which improves performance on multi-step problems and enables better evaluation of model reasoning processes.
- Evaluating chain-of-thought monitorability
- Understanding neural networks through sparse circuits
- Introducing OpenAI o1
- Learning to reason with LLMs
Sam Altman
Sam Altman is the CEO of OpenAI, an AI research company, who leads development of large language models and has advocated for government regulation of advanced AI systems.
- AI progress and recommendations
- Sam & Jony
- Evolving OpenAI’s structure
- Thinking with images
Function calling
Function calling is a capability that allows large language models to receive descriptions of functions or tools and return structured arguments to invoke those functions, enabling AI systems to interact with external APIs and tools rather than just generating text.
- Tool Use, Unified
- OpenAI o1 and new tools for developers
- Function calling and other API updates
scaling laws
Scaling laws are empirical relationships, established through research like OpenAI's landmark studies, that describe how neural language model performance improves predictably with increases in model size, dataset size, and computational resources, following power-law patterns across many orders of magnitude.
- Scaling laws for neural language models
- Gemini 3.1 Pro 🤖, OpenAI's strategic issues 💡, building AI eng culture 👨💻
- NanoGPT Slowrun: 10x Data Efficiency with Infinite Compute
Gemini 3 Flash
Gemini 3 Flash is a large language model developed by Google DeepMind that combines advanced reasoning capabilities with fast response times and lower costs for AI agent and automation applications.
- Gemini 3 Flash: frontier intelligence built for speed
- LWiAI Podcast #229 - Gemini 3 Flash, ChatGPT Apps, Nemotron 3
- Import AI 446: Nuclear LLMs; China's big AI benchmark; measurement and AI policy
Gemini 2.5 Flash
Gemini 2.5 Flash is a hybrid reasoning AI model developed by Google DeepMind that allows developers to toggle extended thinking on and off with configurable thinking budgets to balance performance, cost, and latency.
- We’re expanding our Gemini 2.5 family of models
- Gemini 2.5: Updates to our family of thinking models
- Introducing Gemini 2.5 Flash
Gemini API
Google's API that provides developers access to Gemini AI models, including Gemini 2.0 Flash and other variants, for building AI applications with capabilities in text generation, multimodal outputs, tool use, and agentic reasoning.
- Gemini 3 Deep Think: Advancing science, research and engineering
- Gemini 2.0 is now available to everyone
- Introducing Gemini 2.0: our new AI model for the agentic era
Gemini 3 Deep Think
Gemini 3 Deep Think is Google's AI model designed for science and engineering tasks involving messy or complex data, available to Ultra subscribers in the Gemini app and to researchers and enterprises via API early access.
- The latest AI news we announced in February
- Gemini 3 Deep Think: Advancing science, research and engineering
- LWiAI Podcast #234 - Opus 4.6, GPT-5.3-Codex, Seedance 2.0, GLM-5
Mixture of Experts (MoE)
Mixture of Experts (MoE) is an architecture design for large language models that routes each token through only a small subset of specialized neural network components called experts, allowing sparse models to decouple computational capacity from inference cost while maintaining or exceeding baseline performance.
- Mixture of Experts (MoEs) in Transformers
- Differential Transformer V2
- iPhone 17e 📱, SpaceX tower catch plan 🚀, how to save SaaS 💼`
llama.cpp
llama.cpp is an open-source inference runtime created by Georgi Gerganov and the GGML team that enables efficient local execution of large language models on personal computers and servers.
- GGML and llama.cpp join HF to ensure the long-term progress of Local AI
- New in llama.cpp: Model Management
- Transformers v5: Simple model definitions powering the AI ecosystem
CUDA
CUDA is a parallel computing platform and API developed by NVIDIA that enables software developers to use graphics processing units (GPUs) for general-purpose processing by writing specialized code kernels that run on NVIDIA GPUs.
- An Interview with Nvidia CEO Jensen Huang About Accelerated Computing
- Custom Kernels for All from Codex and Claude
- We Got Claude to Build CUDA Kernels and teach open models!
GPT-OSS
GPT-OSS is an open-source language model developed by OpenAI that is optimized for agentic reinforcement learning training, supporting multi-step tool use and inference through frameworks like the transformers library.
- Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective
- Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers
- Introducing gpt-oss
Gemini CLI
Gemini CLI is a command-line interface tool compatible with Hugging Face's HF-skills repository that enables coding agents to perform machine learning tasks such as model fine-tuning, reinforcement learning alignment, checkpoint evaluation, and model publishing through natural language commands.
- Codex is Open Sourcing AI models
- What is agentic engineering?
- Google offers ‘vibe design’ tool that you can shout at to create a UI
Deep Research
Deep Research is an AI research agent developed by Tavily that automatically conducts in-depth investigations and analysis tasks, designed to scale with future model improvements rather than rely on hand-crafted heuristics.
- Building Deep Research: How we Achieved State of the Art
- Deep research System Card
- Introducing deep research
MiniMax
MiniMax is a Chinese AI company that develops open-source large language models, including the M2.7 series, which achieve performance competitive with state-of-the-art models like GLM-5 and Claude Sonnet while reducing computational costs.
- [AINews] MiniMax 2.7: GLM-5 at 1/3 cost SOTA Open Model
- Aligning to What? Rethinking Agent Generalization in MiniMax M2
- Anthropic data harvested 🤖, AI memo crashes stocks 📉, don't quit your job 💼
vibe coding
Vibe coding is a software development approach that uses AI language models as high-level programming tools to translate developer intent directly into code, rather than writing code through traditional manual syntax.
- Reports of code's death are greatly exaggerated
- VibeGame: Exploring Vibe Coding Games
- What is agentic engineering?
Agentic AI Systems
Agentic AI systems are AI applications that autonomously perceive their environment, plan actions, and use external tools to accomplish goals with minimal human intervention, raising key concerns about safety, alignment, and human oversight.
- How to Build an MCP Server with Gradio
- Toward understanding and preventing misalignment generalization
- The Anthropic Institute
Llama
Llama is an open-weight large language model family developed by Meta that includes multimodal capabilities and is designed for on-device and local deployment across various hardware platforms.
- Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM
- Llama can now see and run on your device - welcome Llama 3.2
- Import AI 439: AI kernels; decentralized training; and universal representations
Copilot
Copilot is Microsoft's AI-powered coding assistant that uses large language models to help developers write, review, and debug code faster.
- Copilot coding agent now starts work 50% faster
- Satya Nadella — How Microsoft is preparing for AGI
- Personal Copilot: Train Your Own Coding Assistant
Claude Cowork
Claude Cowork is an agentic AI workspace tool developed by Anthropic that orchestrates multiple Claude Code instances to handle multi-step workflows and complex projects for both technical and non-technical users.
- [AINews] Claude Cowork Dispatch: Anthropic's Answer to OpenClaw
- Why Anthropic Thinks AI Should Have Its Own Computer — Felix Rieseberg of Claude Cowork & Claude Code Desktop
- Import AI 441: My agents are working. Are yours?
Agentic systems
Agentic systems are AI applications where language models autonomously execute tasks by planning actions, calling tools, and processing their outputs to achieve user-defined goals, with security considerations including prompt injection defense and safe handling of untrusted tool outputs.
- [AINews] Context Drought
- Improving instruction hierarchy in frontier LLMs
- Keeping your data safe when an AI agent clicks a link
Jevons Paradox
The Jevons Paradox is an economic principle stating that improvements in efficiency of resource use tend to increase overall consumption of that resource, as lower costs stimulate greater demand.
- [AINews] AI Engineer will be the LAST job
- Coding After Coders: The End of Computer Programming as We Know It
- Fixing Claude with Claude: Anthropic reports on AI site reliability engineering
Gemini 3.1 Flash-Lite
Gemini 3.1 Flash-Lite is a lightweight language model developed by Google that features adjustable thinking levels and is designed for high-volume, cost-efficient workloads.
- [AINews] Is Harness Engineering real?
- [AINews] Anthropic @ $19B ARR, Qwen team leaves, Gemini and GPT bump up fast models
- GPT-5.4 mini and GPT-5.4 nano, which can describe 76,000 photos for $52
Qwen
Qwen is a family of large language models developed by Alibaba that supports hybrid architectures mixing RNN/SSM layers with attention mechanisms for improved long-context performance and inference efficiency.
- Olmo Hybrid and future LLM architectures
- [AINews] Anthropic @ $19B ARR, Qwen team leaves, Gemini and GPT bump up fast models
- Alibaba has made 470,000 AI chips, admits they’re inferior and may always be
Superintelligence
Superintelligence is a hypothetical artificial intelligence system that would possess intellectual capabilities far exceeding those of humans across virtually all domains, including scientific research, strategy, and complex reasoning.
- Ilya Sutskever — We're moving from the age of scaling to the age of research
- Meta Superintelligence – Leadership Compute, Talent, and Data
- AI progress and recommendations
GitHub Copilot
GitHub Copilot is an AI-powered code completion tool developed by GitHub that uses machine learning models to suggest code snippets and generate code in real-time within developer environments.
- Trace any Copilot coding agent commit to its session logs
- Introducing GPT-5.3-Codex-Spark
- Embedding AI into developer software
agentic coding workflows
Agentic coding workflows are development processes where AI coding agents autonomously generate, modify, and commit code changes while providing traceability through session logs and integration with version control systems.
- Trace any Copilot coding agent commit to its session logs
- Addendum to GPT-5.2 System Card: GPT-5.2-Codex
- The Agent Skills Directory
Stanford
Stanford University is a private research university located in Stanford, California, known for its engineering and computer science programs, artificial intelligence research, and economics department.
- The displacement of cognitive labor and what comes after
- Import AI 442: Winners and losers in the AI economy; math proof automation; and industrialization of cyber espionage
- Import AI 435: 100k training runs; AI systems absorb human power; intelligence per watt
React
React is a JavaScript library developed and maintained by Meta for building user interfaces using reusable components and a declarative programming model.
- Introducing GPT-5.2-Codex
- We rewrote our Rust WASM parser in TypeScript and it got faster
- I turned Markdown into a protocol for generative UI
Elon Musk
Elon Musk is a South African-born entrepreneur and engineer who founded and leads Tesla (electric vehicles and energy storage), SpaceX (commercial spaceflight and rockets), and more recently xAI (an artificial intelligence company building the Grok chatbot).
- Meta 20% layoffs 💼, Musk rebuilds xAI 🧠, Travis Kalanick's robots 🤖
- iPhone Fold layout 📱, xAI + Tesla 🤖, fixing Javascript time 🧑💻
- Musk’s tactic of blaming users for Grok sex images may be foiled by EU law
The Verge
The Verge is a technology and culture news website that covers consumer electronics, software, science, and digital culture, founded in 2011 by Vox Media.
- OpenAI is planning a desktop ‘superapp’
- Marc Andreessen is a philosophical zombie
- Google reveals its solution for true Android sideloading: a mandatory waiting period
Vercel
Vercel is a cloud platform for deploying and hosting web applications, known for its integration with Next.js and support for serverless functions, with recent expansion into AI tooling including a Chat SDK for building bots across communication platforms.
- Build knowledge agents without embeddings
- Jane Street vs Bitcoin 🪙, AGI career decisions 💼, Vercel Chat SDK 🤖
- Anthropic's Hidden Vercel Competitor "Antspace"
Y Combinator
Y Combinator is a startup accelerator founded in 2005 by Paul Graham that invests in and mentors early-stage technology companies, selecting cohorts of startups twice per year for intensive three-month programs in exchange for equity.
- Delve did the security compliance on LiteLLM, an AI project hit by malware
- Launch HN: Canary (YC W26) – AI QA that understands your code
- Claim Chowder: Anthropic CEO Dario Amodei on the Percentage of Code Being Generated by AI Today
SQLite
SQLite is a lightweight, open-source relational database engine that stores data in self-contained files and requires minimal configuration, widely used in applications from embedded systems to web development frameworks.
- Coding agents for data analysis
- Production query plans without production data
- ‘Software Bonkers’
Amazon Bedrock
Amazon Bedrock is an AWS managed service that provides access to foundation models and enables organizations to build and deploy AI applications, including agents with stateful runtime environments for multi-step workflows and context retention.
- OpenAI and Amazon announce strategic partnership
- Introducing the Stateful Runtime Environment for Agents in Amazon Bedrock
- GOV.UK chatbot gets smarter but slower as LLMs improve
Sashiko
Sashiko is a Rust-based AI code review system built by Google that analyzes Linux kernel patches using large language models to identify bugs and provide feedback to maintainers, achieving a 53% detection rate on recent bugs that human reviewers missed.
- Sashiko: AI code review system for the Linux kernel spots bugs humans miss
- The Sashiko patch-review system
- Amazon vs USPS 📪, Google's vibe designer 🧑🎨, how Codex works 🤖
Disney
Disney is an American entertainment conglomerate known for creating and distributing films, television programs, theme parks, and consumer products, and recently invested $1 billion in OpenAI for licensed character content generation capabilities.
- LWiAI Podcast #228 - GPT 5.2, Scaling Agents, Weird Generalization
- Last Week in AI #329 - GPT 5.2, GenAI.mil, Disney in Sora
- Meta buys Moltbook 🦞, YouTube passes Disney 📈, Cloudflare crawling 👨💻
Claude Sonnet 4
Claude Sonnet 4 is a large language model developed by Anthropic that has been evaluated for safety in cross-lab assessments and used in research on AI alignment, including nuclear deterrence simulations and adversarial fine-tuning detection systems.
- OpenAI and Anthropic share findings from a joint safety evaluation
- Import AI 446: Nuclear LLMs; China's big AI benchmark; measurement and AI policy
- Import AI 433: AI auditors; robot dreams; and software for helping an AI run a lab
Jack Clark
Jack Clark is a co-founder of Anthropic who writes about AI developments and has been documenting practical applications of advanced AI language models like Claude in research, software development, and agentic workflows.
- Import AI 441: My agents are working. Are yours?
- Import AI 438: Silent sirens, flashing for us all
- Import AI 431: Technological Optimism and Appropriate Fear
Moonshot AI
Moonshot AI is a Chinese AI company known for developing Kimi, an open-source multimodal language model trained on large-scale mixed visual and text data, designed to support agentic AI applications and reasoning tasks.
- OpenClaw Agents Can Be Guilt-Tripped Into Self-Sabotage
- Quoting Kimi.ai @Kimi_Moonshot
- Last Week in AI #334 - Kimi K2.5 & Code, Genie 3, OpenClaw & Moltbook
Kimi K2.5
Kimi K2.5 is an open-source multimodal large language model developed by Moonshot AI, trained on 15 trillion mixed visual and text tokens, and designed for agentic applications including code generation.
- Streaming experts
- Quoting Kimi.ai @Kimi_Moonshot
- Last Week in AI #334 - Kimi K2.5 & Code, Genie 3, OpenClaw & Moltbook
Trivy
Trivy is an open-source vulnerability scanner developed and maintained by Aqua Security that detects security vulnerabilities in container images, filesystems, and other artifacts.
- Trivy under attack again: Widespread GitHub Actions tag compromise secrets
- Trivy Compromised a Second Time - Malicious v0.69.4 Release
- Widely used Trivy scanner compromised in ongoing supply-chain attack
SoftBank
SoftBank is a Japanese multinational conglomerate led by founder Masayoshi Son that invests in technology companies and infrastructure projects, including recent major commitments to AI infrastructure through partnerships with OpenAI, Oracle, and MGX.
- Scaling AI for everyone
- New funding to build towards AGI
- Announcing The Stargate Project
GPT-5-Codex
GPT-5-Codex is an OpenAI model optimized for agentic coding tasks that performs code review and autonomous software engineering work across multiple platforms including terminal, IDE, GitHub, and ChatGPT, available via API at the same price as GPT-5.
- Codex is now generally available
- Introducing upgrades to Codex
- Addendum to GPT-5 system card: GPT-5-Codex
foundation model
A foundation model is a large-scale machine learning model trained on broad data that can be adapted for a wide range of downstream tasks and applications.
- First look at GPT-5
- gpt-oss-120b & gpt-oss-20b Model Card
- New funding to build towards AGI
GPT-4.1
GPT-4.1 is a foundation model released by OpenAI via API that features a 1M token context window and is used for tasks such as code summarization, quality assurance, and tool usage across various applications and services.
- Shipping code faster with o3, o4-mini, and GPT-4.1
- New tools and features in the Responses API
- Introducing GPT-4.1 in the API
LiteLLM
- Delve did the security compliance on LiteLLM, an AI project hit by malware
- Tell HN: Litellm 1.82.7 and 1.82.8 on PyPI are compromised
- LiteLLM Compromised by Credential Stealer
Code Interpreter
Code Interpreter is OpenAI's sandboxed Python execution environment that allows developers to run and execute code safely through the Assistants API.
- New tools and features in the Responses API
- New models and developer products announced at DevDay
JSON
JSON (JavaScript Object Notation) is a lightweight, text-based data format that uses human-readable key-value pairs and arrays to structure and exchange data between applications and systems.
- Function calling and other API updates
- I turned Markdown into a protocol for generative UI
AlphaGo
AlphaGo is an artificial intelligence system developed by DeepMind that uses reinforcement learning and Monte Carlo tree search to play the game of Go, achieving superhuman performance and defeating world champion Lee Sedol in 2016.
- From games to biology and beyond: 10 years of AlphaGo’s impact
- Our vision for building a universal AI assistant
Gemini 3 Pro
Gemini 3 Pro is an AI model developed by Google DeepMind that performs complex reasoning tasks and problem-solving, available to developers through API access and consumer platforms.
- The latest AI news we announced in February
- Gemini 3 Flash: frontier intelligence built for speed
Gemini 3
Gemini 3 is Google's latest flagship AI model featuring advanced reasoning, multimodal capabilities, and agentic features designed for complex task execution and developer collaboration.
- Start building with Gemini 3
- A new era of intelligence with Gemini 3
Gemini 2.5 Flash-Lite
Gemini 2.5 Flash-Lite is Google DeepMind's lightweight language model released in preview, designed as their most cost-efficient and fastest option with a 1-million-token context window, tool use capabilities, and improved performance over its predecessor.
- We’re expanding our Gemini 2.5 family of models
- Gemini 2.5: Updates to our family of thinking models
Replit
Replit is an online development platform that provides AI-assisted coding tools and has expanded into a broader knowledge work productivity suite including features for creating slides, videos, and applications.
- [AINews] Replit Agent 4: The Knowledge Work Agent
- Gemini 2.5 Pro Preview: even better coding performance
Cognition
Cognition is an AI software company that develops Devin, an autonomous AI software engineer agent powered by advanced language models for code generation and software development tasks.
- Gemini 2.5 Pro Preview: even better coding performance
- Introducing GPT-5.1 for developers
Gemma 3
Gemma 3 is an open-weight multimodal large language model family released by Google DeepMind that supports multiple languages, extended context windows, and features improved STEM capabilities for local deployment.
- Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM
- Introducing Gemma 3
Gemini app
The Gemini app is Google's consumer-facing application that provides access to Gemini AI models, including Gemini 3 Deep Think for complex reasoning and problem-solving tasks.
- The latest AI news we announced in February
- Gemini 3 Deep Think: Advancing science, research and engineering
UC Berkeley
UC Berkeley is a public research university in California that conducts advanced research across multiple disciplines, including artificial intelligence and enterprise systems.
- IBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST
- Early experiments in accelerating science with GPT-5
GPT-OSS-120B
GPT-OSS-120B is a large open-source language model with 120 billion parameters that was evaluated in IBM and UC Berkeley research studying failure modes in AI agent systems.
- IBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST
- gpt-oss-120b & gpt-oss-20b Model Card
Transformers.js
Transformers.js is a JavaScript library developed by Hugging Face that enables machine learning model inference directly in web browsers and server-side JavaScript environments, with support for WebGPU acceleration for improved performance.
- Transformers.js v4 Preview: Now Available on NPM!
- Transformers.js v3: WebGPU Support, New Models & Tasks, and More…
WebGPU
WebGPU is a modern web standard for GPU-accelerated computing in web browsers, enabling high-performance graphics and general-purpose GPU computations across different browser engines and platforms.
- Transformers.js v4 Preview: Now Available on NPM!
- Transformers.js v3: WebGPU Support, New Models & Tasks, and More…
NPM
NPM is a package manager for JavaScript that allows developers to install, share, and manage code libraries and dependencies for Node.js and frontend projects.
- Transformers.js v4 Preview: Now Available on NPM!
- Supply-chain attack using invisible code hits GitHub and other repositories
Deno
Deno is a JavaScript and TypeScript runtime built on the V8 engine that emphasizes security, modern APIs, and compatibility with web standards as an alternative to Node.js.
- JavaScript Sandboxing Research
- Transformers.js v4 Preview: Now Available on NPM!
GRPO
GRPO is a reinforcement learning algorithm that optimizes policies over multi-step trajectories in agentic systems, assigning credit across long-horizon decisions like tool selection and query reformulation, developed by DeepSeek as an alternative to offline preference methods like PPO.
- Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective
- On the Shifting Global Compute Landscape
DeepSeek R1
DeepSeek R1 is an open-source large language model released by Chinese AI company DeepSeek that achieved benchmark performance comparable to leading closed-source models and catalyzed a wave of open-source model releases from Chinese companies throughout 2025.
- 2025 Open Models Year in Review
- One Year Since the “DeepSeek Moment”
Transformer
A Transformer is a neural network architecture that uses self-attention mechanisms to process sequences of data in parallel, forming the foundation of modern large language models and other state-of-the-art AI systems.
- [AINews] Yann LeCun’s AMI Labs launches with a $1B seed @ $4.5B to build world models around JEPA
- Differential Transformer V2
Chat Completions
Chat Completions is OpenAI's API format that generates text responses to conversational prompts, serving as a foundational interface for language model applications.
- Open Responses: What you need to know
- OpenAI o3-mini
LoRA
LoRA (Low-Rank Adaptation) is a machine learning technique that enables efficient fine-tuning of large language models by training only small, low-rank matrices instead of updating all model parameters, allowing for faster and more memory-efficient adaptation of pre-trained models.
- We Got Claude to Fine-Tune an Open Source LLM
- Goodbye cold boot - how we made LoRA Inference 300% faster
vLLM
vLLM is an open-source inference framework designed to accelerate the serving of large language models through efficient batch processing and memory optimization techniques.
- NVIDIA's AI Engineers: Agent Inference at Planetary Scale and "Speed of Light" — Nader Khalil (Brev), Kyle Kranen (Dynamo)
- Transformers v5: Simple model definitions powering the AI ecosystem
SGLang
SGLang is an open-source inference framework for running large language models that provides optimized execution pipelines and is compatible with major deployment stacks like NVIDIA Dynamo, Transformers, and vLLM.
- NVIDIA's AI Engineers: Agent Inference at Planetary Scale and "Speed of Light" — Nader Khalil (Brev), Kyle Kranen (Dynamo)
- Transformers v5: Simple model definitions powering the AI ecosystem
LLM inference
LLM inference is the process of running a trained large language model to generate predictions or text output in response to user inputs, typically optimized for throughput and latency through techniques like continuous batching and key-value caching.
- Continuous batching from first principles
- Tencent says small clouds can’t get hardware, so big clouds can hike prices
LLM agents
LLM agents are AI systems built on large language models that use tools and take actions in response to user inputs, designed to generalize across different frameworks, programming environments, and tool formats for real-world task execution.
- Aligning to What? Rethinking Agent Generalization in MiniMax M2
- Introducing Agents.js: Give tools to your LLMs using JavaScript
semantic search
Semantic search is a search technique that uses vector embeddings and language understanding to find results based on meaning and context rather than just keyword matching.
- Retrieval After RAG: Hybrid Search, Agents, and Database Design — Simon Hørup Eskildsen of Turbopuffer
- Sentence Transformers is joining Hugging Face!
MCP (Model Context Protocol)
MCP (Model Context Protocol) is an open protocol developed by Anthropic that enables large language models and AI agents to connect to and interact with external tools, APIs, and data sources.
- Building the Hugging Face MCP Server
- How to Build an MCP Server with Gradio
Tiny Agents
Tiny Agents is a series of minimal code examples by Hugging Face demonstrating how to build MCP-powered AI agents in approximately 50-70 lines of Python code, showcasing lightweight agent architectures using the Model Context Protocol for tool discovery and invocation.
- Tiny Agents in Python: a MCP-powered agent in ~70 lines of code
- Tiny Agents: an MCP-powered agent in 50 lines of code
Llama 4
Llama 4 is Meta's open-weight large language model family that includes variants such as Maverick and Scout, released on Hugging Face for developers to use in local inference, fine-tuning, and AI application integration.
- Welcome Llama 4 Maverick & Scout on Hugging Face
- Import AI 435: 100k training runs; AI systems absorb human power; intelligence per watt
Mistral
Mistral is an AI company that develops and releases open-source large language models, including the Mistral family of models that compete with Google's Gemma and Meta's Llama for local and on-premise AI deployment.
- Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM
- Introducing Mistral Small 4
DeepSeek-R1
DeepSeek-R1 is a reasoning model developed by DeepSeek that demonstrates strong reasoning capabilities on complex problems, achieved through reinforcement learning techniques and internally simulating multiple distinct reasoning personas.
- Open-R1: a fully open reproduction of DeepSeek-R1
- Import AI 444: LLM societies; Huawei makes kernels with AI; ChipBench
smolagents
Smolagents is a lightweight agent framework developed by HuggingFace that enables agents to write and execute actions as code rather than JSON tool calls.
- Introducing smolagents: simple agents that write actions in code.
- Our Transformers Code Agent beats the GAIA benchmark 🏅
Fine-tuning
Fine-tuning is the process of retraining a pre-trained machine learning model on domain-specific or custom datasets to improve its performance on particular tasks beyond what the base model can achieve through prompting alone.
- Personal Copilot: Train Your Own Coding Assistant
- Fine-tuning now available for GPT-4o
JavaScript
JavaScript is a programming language designed primarily for client-side web development that enables interactive functionality in web browsers, though it has expanded to server-side environments like Node.js.
- Does Computer Science still exist?
- Introducing Agents.js: Give tools to your LLMs using JavaScript
code generation
Code generation is the automatic creation of executable source code by AI systems or tools, typically from high-level specifications like natural language descriptions, converting user intent into functioning software without manual programming.
- Training code generation models to debug their own outputs
- Making a web app generator with open ML models
MiniMax M2.7
MiniMax M2.7 is an open-source AI model developed by MiniMax that matches state-of-the-art performance at approximately one-third the cost, with notable benchmark results including 56.22% on SWE-Pro and strong agentic coding capabilities.
- Last Week in AI #339 - DLSS 5, OpenAI Superapp, MiniMax M2.7
- [AINews] MiniMax 2.7: GLM-5 at 1/3 cost SOTA Open Model
Claude Sonnet 4.6
Claude Sonnet 4.6 is an AI model developed by Anthropic that features a 1-million-token context window and achieved new records on SWE-Bench and OS World benchmarks with improvements in coding, instruction-following, and agentic task performance.
- [AINews] MiniMax 2.7: GLM-5 at 1/3 cost SOTA Open Model
- Last Week in AI #336 - Sonnet 4.6, Gemini 3.1 Pro, Anthropic vs Pentagon
Slack
Slack is a messaging and collaboration platform founded by Stewart Butterfield that enables teams to communicate through channels, direct messages, and integrations with other business tools.
- Why Anthropic Thinks AI Should Have Its Own Computer — Felix Rieseberg of Claude Cowork & Claude Code Desktop
- Codex is now generally available
NemoClaw
NemoClaw is NVIDIA's security-focused response to OpenClaw, announced at the 2026 GTC conference as part of NVIDIA's inference computing infrastructure.
- Claude Code can now take over your computer to complete tasks
- [AINews] NVIDIA GTC: Jensen goes hard on OpenClaw, Vera CPU, and announces $1T sales backlog in 2027
Computer Use
Computer Use is a capability that enables AI agents to interact with computers by viewing screens and executing actions like clicking and typing, allowing autonomous systems to control applications and perform tasks without direct human input.
- Claude Can Now Take Control of Your Mac
- Cursor's Third Era: Cloud Agents
Azure
Azure is Microsoft's cloud computing platform that provides a suite of services including virtual machines, databases, analytics, and AI tools for building, deploying, and managing applications.
- Amazon’s AI Resurgence: AWS & Anthropic’s Multi-Gigawatt Trainium Expansion
- AWS and OpenAI announce multi-year strategic partnership
Scale AI
Scale AI is a data infrastructure company that provides data labeling and generation services for AI model training, backed by Meta with a 49% stake valued at approximately $30 billion.
- Meta Superintelligence – Leadership Compute, Talent, and Data
- Import AI 447: The AGI economy; testing AIs with generated games; and agent ecologies
open models
Open models are machine learning models with publicly available weights and architecture that can be freely downloaded, modified, and deployed by anyone, typically released by research labs and companies to enable community development and reduce dependence on proprietary closed systems.
- What comes next with open models
- 2025 Open Models Year in Review
closed models
Closed models are proprietary AI systems developed and controlled by companies like OpenAI, Anthropic, and Google, whose weights and training data are not publicly released, typically offering API access while maintaining vertical integration advantages including internal tools and optimization for specific domains like healthcare and legal applications.
- What comes next with open models
- 2025 Open Models Year in Review
Kimi
Kimi is a large language model developed by Chinese AI company Moonshot AI that uses a hybrid architecture combining RNN/SSM layers with traditional attention mechanisms for efficient long-context processing.
- OpenClaw Agents Can Be Guilt-Tripped Into Self-Sabotage
- Olmo Hybrid and future LLM architectures
AI-assisted coding
AI-assisted coding refers to the use of artificial intelligence tools, such as Claude Code and similar AI assistants, to generate, complete, or optimize software code, reducing manual programming work and shifting developer focus from implementation to design and specification.
- Claude Code Hits Different
- Quoting Les Orchard
GLM 4.7
GLM 4.7 is an open-source large language model developed by Zhipu AI designed for coding tasks.
- Latest open artifacts (#17): NVIDIA, Arcee, Minimax, DeepSeek, Z.ai and others close an eventful year on a high note
- LWiAI Podcast #230 - 2025 Retrospective, Nvidia buys Groq, GLM 4.7, METR
dwarkesh
Dwarkesh is an AI researcher and podcaster known for interviewing leading figures in AI development and publishing critical analysis of AI progress claims and timelines.
- Dario Amodei — "We are near the end of the exponential"
- Thoughts on AI progress (Dec 2025)
Dwarkesh Patel
Dwarkesh Patel is an AI researcher and podcaster known for analyzing AI development approaches, reinforcement learning efficiency, and the theoretical foundations of AI progress.
- Thoughts on AI progress (Dec 2025)
- RL is even more information inefficient than you thought
Satya Nadella
Satya Nadella is the Chief Executive Officer of Microsoft who oversees the company's artificial intelligence infrastructure and strategy, including its investments in large-scale data centers and partnerships with AI companies like OpenAI.
- Thoughts on slowing the fuck down
- Satya Nadella — How Microsoft is preparing for AGI
GB300
The GB300 is a high-end GPU manufactured by NVIDIA used for large-scale artificial intelligence and machine learning workloads, deployed in hyperscale datacenters by companies including Microsoft and Poolside.
- Satya Nadella — How Microsoft is preparing for AGI
- Import AI 432: AI malware; frankencomputing; and Poolside's big cluster
Qwen3.5-397B-A17B
Qwen3.5-397B-A17B is a 397 billion parameter mixture-of-experts language model developed by Alibaba that can run on consumer hardware through efficient on-demand expert weight streaming from storage, achieving local inference at speeds of 4.4+ tokens per second on a MacBook Pro with 48GB RAM when 4-bit quantized.
- Streaming experts
- Flash-MoE: Running a 397B Parameter Model on a Laptop
Rust
Rust is a systems programming language designed for memory safety and concurrency, maintained by the Rust Foundation and widely used for building performance-critical software including WebAssembly, network protocols, and command-line tools.
- We rewrote our Rust WASM parser in TypeScript and it got faster
- Noq: n0's new QUIC implementation in Rust
New York Times
The New York Times is a major American daily newspaper based in New York City that publishes news, opinion, and analysis across print and digital platforms.
- NYT: ‘Meta Delays Rollout of New AI Model After Performance Concerns’
- “Your frustration is the product”
OpenCode
OpenCode is an open-source AI coding agent with 120,000 GitHub stars and millions of monthly active users that provides autonomous code generation and development assistance functionality.
- OpenCode – Open source AI coding agent
- Anthropic takes legal action against OpenCode
Kubernetes
Kubernetes is an open-source container orchestration platform originally developed by Google that automates the deployment, scaling, and management of containerized applications across clusters of machines.
- Widely used Trivy scanner compromised in ongoing supply-chain attack
- macOS 26 breaks custom DNS settings including .internal
TypeScript
TypeScript is a programming language built on JavaScript that adds static type checking and is developed and maintained by Microsoft.
- We rewrote our Rust WASM parser in TypeScript and it got faster
- I turned Markdown into a protocol for generative UI
iPhone
The iPhone is a smartphone designed and manufactured by Apple that runs the iOS operating system and is known for its touchscreen interface, integration with Apple's ecosystem, and status as one of the world's best-selling mobile devices.
- Sam & Jony
- Hundreds of millions of iPhones can be hacked with a new tool found in the wild
NSA
The National Security Agency is a U.S. intelligence agency that conducts signals intelligence and cybersecurity operations under the authority of the Department of Defense.
- Claude attacks were 'Rorschach test' for infosec community, scaring former NSA boss
- FBI started buying Americans' location data again, Kash Patel confirms
Android
Android is Google's open-source mobile operating system that powers the majority of smartphones and tablets worldwide, known for its flexibility, customization options, and integration with Google services.
- How We Used Codex to Ship Sora for Android in 28 Days
- Google reveals its solution for true Android sideloading: a mandatory waiting period
NHTSA
The National Highway Traffic Safety Administration (NHTSA) is a U.S. federal agency within the Department of Transportation that establishes and enforces safety standards for motor vehicles and investigates defects in vehicle safety systems.
- Waymo Safety Impact
- Tesla: Failure of the FSD's degradation detection system [pdf]
Tesla
Tesla is an electric vehicle and energy storage company founded by Elon Musk in 2003 that manufactures cars, batteries, and solar products, and develops autonomous driving technology including its Full Self-Driving system.
- iPhone Fold layout 📱, xAI + Tesla 🤖, fixing Javascript time 🧑💻
- Tesla: Failure of the FSD's degradation detection system [pdf]
comprehension debt
Comprehension debt is the knowledge gap that accumulates when teams use AI code generation at scale without maintaining human understanding of the code's design decisions and system architecture, creating risk of unexpected failures when changes are needed.
- Meta 20% layoffs 💼, Musk rebuilds xAI 🧠, Travis Kalanick's robots 🤖
- Comprehension Debt - the hidden cost of AI generated code
Avocado
Meta's Avocado is a foundation AI model that was delayed past its original launch date due to underperformance on reasoning and coding benchmarks compared to rival models.
- Meta's AI flop 🤖, Google Maps redesign 🗺️, Perplexity Agent API 🧑💻
- NYT: ‘Meta Delays Rollout of New AI Model After Performance Concerns’
Craig Mod
Craig Mod is an author and software developer known for building custom software tools using AI assistance, including a multi-currency accounting system called TaxBot2000 that he created in 5 days using Claude and Python.
- Quoting Craig Mod
- ‘Software Bonkers’
foundation models
Foundation models are large neural networks trained on broad datasets of text, images, or other data that can be adapted for a wide range of downstream tasks through fine-tuning or prompting, with examples including GPT models and other frontier language models developed by AI research organizations.
- GPT-5.2 derives a new result in theoretical physics
- UK blinks on AI copyright carve-out after star-studded revolt
QCon London
QCon London is an annual software development conference held in London that features talks and presentations from industry practitioners on topics including system architecture, AI tools for software development, and site reliability engineering.
- Fixing Claude with Claude: Anthropic reports on AI site reliability engineering
- AI for software developers is in a 'dangerous state'
Alibaba
Alibaba is a Chinese multinational technology conglomerate that operates e-commerce platforms, cloud computing services through Alibaba Cloud, and develops AI chips through its T-Head unit.
- Alibaba has made 470,000 AI chips, admits they’re inferior and may always be
- Tencent says small clouds can’t get hardware, so big clouds can hike prices
Alibaba Cloud
Alibaba Cloud is the cloud computing and AI infrastructure unit of Chinese e-commerce giant Alibaba Group, providing computing, storage, and AI services to enterprise customers primarily in China, with recent quarterly revenues of $6.2 billion and plans to develop proprietary AI chips through its T-Head subsidiary.
- Alibaba has made 470,000 AI chips, admits they’re inferior and may always be
- Alibaba Cloud hikes prices by up to 34%, blames hardware costs and AI demand
Intercom
Intercom is a customer communications platform that provides messaging, chatbots, and support tools for businesses, and has recently pivoted to AI-driven customer service with its Fin AI agent product.
- Amazon vs USPS 📪, Google's vibe designer 🧑🎨, how Codex works 🤖
- iPhone 17e 📱, SpaceX tower catch plan 🚀, how to save SaaS 💼`
Minions
Minions is Stripe's internal system that uses AI agents in isolated environments to automate pull request generation and code review, shipping approximately 1,300 PRs per week through hybrid deterministic and agentic orchestration.
- OpenAI's pivot 🤖, Nvidia space GPUs 🛰️, Stripe's internal dev agent 👨💻
- OpenAI's smart speaker 📢, Apple visual intelligence 👀, Code Mode 🧑💻
Perplexity
Perplexity is an AI research company that provides an API platform for building AI agents, offering developers access to real-time search capabilities, frontier model switching, and an execution environment for creating autonomous AI systems.
- LWiAI Podcast #235 - Sonnet 4.6, Deep-thinking tokens, Anthropic vs Pentagon
- Meta's AI flop 🤖, Google Maps redesign 🗺️, Perplexity Agent API 🧑💻
Shopify
Shopify is a Canadian e-commerce platform founded by Tobias Lütke in 2006 that enables merchants to build and operate online stores, and is publicly traded on the Toronto and New York stock exchanges.
- Shopify/liquid: Performance: 53% faster parse+render, 61% fewer allocations
- Meta's AI flop 🤖, Google Maps redesign 🗺️, Perplexity Agent API 🧑💻
Liquid
Liquid is a template engine created and maintained by Shopify that is used to generate dynamic content by embedding variables and logic into text templates.
- Shopify/liquid: Performance: 53% faster parse+render, 61% fewer allocations
- Meta's AI flop 🤖, Google Maps redesign 🗺️, Perplexity Agent API 🧑💻
GPT-5.3 Instant
GPT-5.3 Instant is an OpenAI language model featuring improved conversational flow, web search quality, and 26.8% reduced hallucinations compared to previous versions.
- LWiAI Podcast #236 - GPT 5.4, Gemini 3.1 Flash Lite, Supply Chain Risk
- M5 Max MacBooks 💻, GPT-5.3 Instant 🤖, end of code reviews 👨💻
SpaceX
SpaceX is a private aerospace company founded by Elon Musk that designs and manufactures rockets and spacecraft for commercial satellite launches, cargo delivery to the International Space Station, and crewed spaceflight missions.
- iPhone 17e 📱, SpaceX tower catch plan 🚀, how to save SaaS 💼`
- Anthropic vs Pentagon 🤖, SpaceX eyes March IPO 💰, lessons building Claude Code 🧑💻
Chat SDK
Vercel's Chat SDK is a development toolkit that enables developers to build chatbots with a single codebase that can be deployed across multiple platforms including Slack, Teams, Discord, and Linear using JSX cards.
- Build knowledge agents without embeddings
- Jane Street vs Bitcoin 🪙, AGI career decisions 💼, Vercel Chat SDK 🤖
AMD
AMD (Advanced Micro Devices) is a semiconductor manufacturer based in Santa Clara, California, that designs and produces processors and graphics processing units for data centers, personal computers, and gaming systems, competing primarily with Intel in CPUs and Nvidia in AI accelerators.
- Alibaba has made 470,000 AI chips, admits they’re inferior and may always be
- Meta's $100B deal 💰, Pentagon threatens Anthropic 🏛️, chinese vibe coders 🧑💻
Dan Woods
Dan Woods is a researcher who demonstrated running large language models on consumer hardware by implementing Apple's "LLM in a Flash" technique to execute the 397-billion parameter Qwen3.5 model on a MacBook M3 Max using AI-driven automated experimentation.
- Streaming experts
- Autoresearching Apple's "LLM in a Flash" to run Qwen 397B locally
GPT-5 mini
GPT-5 mini is a compact language model developed by OpenAI designed for latency-sensitive applications such as coding assistants and subagents.
- Introducing GPT-5.4 mini and nano
- GPT-5.4 mini and GPT-5.4 nano, which can describe 76,000 photos for $52
Django
Django is a free and open-source Python web framework maintained by the Django Software Foundation that provides tools and conventions for building web applications with database-backed architecture.
- Quoting Tim Schilling
- Quoting Jannis Leidel
Claude 3.5 Sonnet
Claude 3.5 Sonnet is an AI language model developed by Anthropic that is known for strong performance on coding tasks and AI research replication benchmarks, achieving the top score of 21.0% on OpenAI's PaperBench evaluation of autonomous research agents.
- PaperBench: Evaluating AI’s Ability to Replicate AI Research
- My fireside chat about agentic engineering at the Pragmatic Summit
Sonnet 4.6
Sonnet 4.6 is an AI language model developed by Anthropic that features a 1 million token context window and is known for strong performance on complex reasoning tasks like the ARC-AGI-2 benchmark.
- LWiAI Podcast #235 - Sonnet 4.6, Deep-thinking tokens, Anthropic vs Pentagon
- 1M context is now generally available for Opus 4.6 and Sonnet 4.6
GitHub Actions
GitHub Actions is a continuous integration and continuous deployment (CI/CD) platform run by GitHub that automates software workflows directly within repositories using event-triggered actions and customizable workflows.
- Trivy under attack again: Widespread GitHub Actions tag compromise secrets
- Perhaps not Boring Technology after all
Linux kernel
The Linux kernel is the core component of the Linux operating system that manages hardware resources and enables communication between software and physical hardware.
- Sashiko: AI code review system for the Linux kernel spots bugs humans miss
- The Sashiko patch-review system
VS Code
VS Code is a free, open-source code editor developed by Microsoft that provides built-in support for debugging, version control, and extensions for multiple programming languages.
- Shipping code faster with o3, o4-mini, and GPT-4.1
- Supply-chain attack using invisible code hits GitHub and other repositories
Uber
Uber is a ride-hailing and delivery services company founded in 2009 by Travis Kalanick and Garrett Camp that connects users with drivers for transportation and connects customers with restaurants and merchants for food and goods delivery.
- Introducing OpenAI Frontier
- Judgment and creativity are all you need
AI governance
AI governance refers to the frameworks, institutions, and policies designed to oversee and regulate the development, deployment, and autonomous improvement of artificial intelligence systems to ensure human oversight and safety.
- Import AI 441: My agents are working. Are yours?
- The Anthropic Institute
PostTrainBench
PostTrainBench is a benchmark that evaluates the capability of frontier large language models to autonomously fine-tune other models, measuring their performance on tasks like reward model training and autonomous LLM optimization.
- ImportAI 449: LLMs training other LLMs; 72B distributed training run; computer vision is harder than generative text
- Import AI 439: AI kernels; decentralized training; and universal representations
METR
METR is an AI evaluation organization known for developing benchmarks and tasks that measure the long-horizon capabilities of AI agents, including extended task horizons measured in hours.
- LWiAI Podcast #230 - 2025 Retrospective, Nvidia buys Groq, GLM 4.7, METR
- Import AI 448: AI R&D; Bytedance's CUDA-writing agent; on-device satellite AI
MIT
MIT (Massachusetts Institute of Technology) is a private research university in Cambridge, Massachusetts, known for research and education in engineering, science, computer science, and economics, and collaborates with other institutions on AI systems research including computer-use agents and economics of artificial intelligence.
- Import AI 447: The AGI economy; testing AIs with generated games; and agent ecologies
- Import AI 436: Another 2GW datacenter; why regulation is scary; how to fight a superintelligence
UCLA
UCLA is a public research university headquartered in Los Angeles, California, where faculty members conduct research across science, engineering, economics, and other disciplines, as evidenced by recent work in optimization theory and AI economics.
- GPT-5 and the future of mathematical discovery
- Import AI 447: The AGI economy; testing AIs with generated games; and agent ecologies
Oxford
Oxford is a university in the United Kingdom that collaborates on AI research, including studies on large language models' capabilities in scientific discovery and biosecurity risks.
- Early experiments in accelerating science with GPT-5
- Import AI 447: The AGI economy; testing AIs with generated games; and agent ecologies
Berkeley
UC Berkeley is a public research university in California known for research in artificial intelligence, machine learning, and computer science, including collaborative work on AI agent systems and benchmarking frameworks.
- Import AI 447: The AGI economy; testing AIs with generated games; and agent ecologies
- Import AI 436: Another 2GW datacenter; why regulation is scary; how to fight a superintelligence
Claude Sonnet 4.5
Claude Sonnet 4.5 is a large language model developed by Anthropic that demonstrates increased situational awareness and is capable of generating complex game benchmarks and assisting with code generation tasks.
- Import AI 447: The AGI economy; testing AIs with generated games; and agent ecologies
- Import AI 431: Technological Optimism and Appropriate Fear
Claude Opus 4
Claude Opus 4 is a large language model developed by Anthropic that demonstrated superior performance on technical assessments, outperforming most human candidates on Anthropic's take-home test.
- OpenAI and Anthropic share findings from a joint safety evaluation
- Import AI 443: Into the mist: Moltbook, agent ecologies, and the internet in transition
Facebook is a social media and technology company founded by Mark Zuckerberg in 2004 that operates the world's largest social network, along with related platforms including Instagram and WhatsApp, and conducts AI research through its Meta AI division.
- Import AI 439: AI kernels; decentralized training; and universal representations
- Import AI 437: Co-improving AI; RL dreams; AI labels might be annoying
Together AI
Together AI is an AI company that develops open-weight language models and inference-efficient architectures, including the Mamba series of state space models designed for production deployment.
- Mamba-3
- Import AI 435: 100k training runs; AI systems absorb human power; intelligence per watt
GPT-5.4 Pro
GPT-5.4 Pro is an AI language model developed by OpenAI that features a 1-million-token context window, native computer-use capabilities, and improved tool use functionality.
- Epoch confirms GPT5.4 Pro solved a frontier math open problem
- LWiAI Podcast #236 - GPT 5.4, Gemini 3.1 Flash Lite, Supply Chain Risk
Huawei
Huawei is a Chinese multinational technology company headquartered in Shenzhen that manufactures telecommunications equipment, consumer electronics, and semiconductors, and is one of the world's largest smartphone and networking hardware manufacturers.
- Last Week in AI #333 - ChatGPT Ads, Zhipu+Huawei, Drama at Thinking Machines
- LWiAI Podcast #229 - Gemini 3 Flash, ChatGPT Apps, Nemotron 3
RAISE Act
The RAISE Act is a proposed United States artificial intelligence safety legislation bill aimed at regulating AI development and deployment.
- LWiAI Podcast #230 - 2025 Retrospective, Nvidia buys Groq, GLM 4.7, METR
- Last Week in AI #331 - Nvidia announcements, Grok bikini prompts, RAISE Act
Sora
Sora is OpenAI's text-to-video AI model that generates video content from written descriptions and is used by companies including Disney for content creation.
- How We Used Codex to Ship Sora for Android in 28 Days
- Last Week in AI #329 - GPT 5.2, GenAI.mil, Disney in Sora
Go
Go is an open-source programming language created by Google that is known for its simplicity, fast compilation, and efficient concurrency support through goroutines.
- Announcing TypeScript 6.0
- Video Conferencing with Durable Streams
Aqua Security
Aqua Security is a cybersecurity company that develops and maintains Trivy, an open-source vulnerability scanner widely used for identifying security risks in container images and software dependencies.
- Trivy Compromised a Second Time - Malicious v0.69.4 Release
- Widely used Trivy scanner compromised in ongoing supply-chain attack
trivy-action
trivy-action is a GitHub Action maintained by Aqua Security that integrates the Trivy vulnerability scanner into CI/CD pipelines for automated container and artifact scanning.
- Trivy Compromised a Second Time - Malicious v0.69.4 Release
- Widely used Trivy scanner compromised in ongoing supply-chain attack
misalignment detection
Misalignment detection is the process of monitoring autonomous systems or agents to identify when their behavior diverges from their intended goals or safety guardrails, using methods such as behavioral telemetry and risk factor analysis.
- How we monitor internal coding agents for misalignment
- GPT-5 bio bug bounty call
SWE-Bench Pro
SWE-Bench Pro is a software engineering benchmark that evaluates AI models on multi-language coding tasks with contamination-resistant design to prevent benchmark data leakage.
- Introducing GPT-5.4 mini and nano
- Introducing GPT-5.3-Codex
Codex Security
Codex Security is an agentic application security tool developed by OpenAI that identifies vulnerabilities in code using frontier models and constraint solving techniques, then generates fixes with reduced false positives.
- Why Codex Security Doesn’t Include a SAST Report
- Codex Security: now in research preview
Instruction Hierarchy
Instruction Hierarchy is a training approach developed by OpenAI that teaches language models to prioritize instructions based on trust levels—ranking system prompts highest, followed by developer instructions, user inputs, and tool outputs lowest—to reduce vulnerability to prompt injection attacks and policy violations.
- Improving instruction hierarchy in frontier LLMs
- The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
Aardvark
Aardvark is an agentic application security tool developed by OpenAI that uses advanced AI models to identify complex software vulnerabilities and generate fixes while reducing false positives.
- Codex Security: now in research preview
- Introducing Aardvark: OpenAI’s agentic security researcher
OpenAI Frontier
OpenAI Frontier is a platform developed by OpenAI for building and managing teams of AI agents, with exclusive third-party cloud distribution provided by Amazon through a strategic partnership announced in 2024.
- OpenAI and Amazon announce strategic partnership
- Introducing OpenAI Frontier
Stateful Runtime Environment
A Stateful Runtime Environment is an execution layer developed by OpenAI and Amazon on Amazon Bedrock that persists context, memory, and tool access across multi-step AI agent workflows, handling state management, error recovery, and long-running task resumption without requiring custom orchestration.
- OpenAI and Amazon announce strategic partnership
- Introducing the Stateful Runtime Environment for Agents in Amazon Bedrock
Figma
Figma is a web-based design and prototyping platform that enables teams to collaborate on user interface and product design in real-time, with features including design-to-code capabilities through partnerships like its integration with OpenAI's Codex.
- OpenAI Codex and Figma launch seamless code-to-design experience
- GPT-5 and the new era of work
SWE-bench Verified
SWE-bench Verified is a benchmark used to evaluate autonomous software engineering capabilities of AI coding agents by measuring their ability to resolve real GitHub issues in open-source repositories.
- Why we no longer evaluate SWE-bench Verified
- Building more with GPT-5.1-Codex-Max
agentic coding tools
Agentic coding tools are AI-powered software development systems that autonomously perform software engineering tasks, such as writing code, debugging, and completing multi-step development workflows, with products including OpenAI's IDE integrations, GitHub Copilot, Cursor, and Claude Code.
- Why we no longer evaluate SWE-bench Verified
- Embedding AI into developer software
macOS
macOS is Apple's operating system for Mac computers, providing the core software that manages hardware resources and runs applications on desktop and laptop devices.
- Anthropic’s Claude can now control your Mac, escalating the fight to build AI agents that actually do work
- Introducing the Codex app
Windows
Windows is an operating system developed by Microsoft that runs on personal computers, laptops, and other devices.
- Introducing the Codex app
- Building more with GPT-5.1-Codex-Max
Codex CLI
Codex CLI is OpenAI's command-line interface tool that orchestrates interactions between users, AI models, and development tools through an agent loop architecture, designed to enable large-scale code refactoring and multi-hour autonomous coding tasks.
- Unrolling the Codex agent loop
- Building more with GPT-5.1-Codex-Max
Apps SDK
OpenAI's Apps SDK is a software development kit that enables developers to build and submit applications directly into ChatGPT, where apps can be triggered via @mention or tools menu for distribution to ChatGPT users.
- Developers can now submit apps to ChatGPT
- Introducing apps in ChatGPT and the new Apps SDK
GPT-3.5
GPT-3.5 is a large language model developed by OpenAI that performs text generation and conversational tasks, known for general-purpose AI assistance and earlier-generation mathematical reasoning capabilities compared to newer models like GPT-5.
- GPT-5 and the future of mathematical discovery
- The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
GPT-5.1-Codex-Max
GPT-5.1-Codex-Max is an agentic coding model developed by OpenAI that is trained to handle multi-file projects and extended agent loops through a compaction process, enabling operation across millions of tokens and significant code refactoring tasks.
- Building more with GPT-5.1-Codex-Max
- GPT-5.1-Codex-Max System Card
GPT-5.1
GPT-5.1 is a large language model developed by OpenAI that features dynamic reasoning capabilities, extended prompt caching up to 24 hours, and developer tools like code patching and shell command execution, designed for both general use and specialized coding tasks.
- GPT-5.1-Codex-Max System Card
- Introducing GPT-5.1 for developers
mechanistic interpretability
Mechanistic interpretability is a research field that aims to reverse-engineer and understand the internal computational mechanisms of neural networks by identifying sparse circuits and interpretable concepts responsible for specific behaviors and outputs.
- Understanding neural networks through sparse circuits
- Extracting Concepts from GPT-4
agentic AI system
An agentic AI system is an autonomous artificial intelligence that can independently plan, execute, and iterate through multi-step tasks without human intervention between steps, typically using tools and reasoning to accomplish specialized objectives.
- Introducing Aardvark: OpenAI’s agentic security researcher
- ChatGPT agent System Card
Public Benefit Corporation
A Public Benefit Corporation (PBC) is a legal business structure that requires a company to consider the interests of stakeholders beyond shareholders, such as employees, customers, and society, while pursuing a stated public benefit alongside profit.
- Statement on OpenAI’s Nonprofit and PBC
- Evolving OpenAI’s structure
bug bounty program
A bug bounty program is a security initiative where organizations invite external researchers to identify and report vulnerabilities in their systems or products, typically offering financial rewards for valid discoveries.
- GPT-5 bio bug bounty call
- Agent bio bug bounty call
ChatGPT agent
ChatGPT agent is an autonomous AI system developed by OpenAI that can take actions on behalf of users to accomplish goal-directed tasks beyond conversational question-answering.
- Introducing ChatGPT agent
- ChatGPT agent System Card
AgentKit
AgentKit is OpenAI's software development kit for building multi-step AI agents that can integrate with APIs, perform complex workflows, and automate tasks like customer support, research, and data extraction.
- Introducing AgentKit, new Evals, and RFT for agents
- New tools for building agents
o3-mini
OpenAI's o3-mini is a cost-efficient reasoning model with adjustable reasoning effort levels that excels in STEM and coding tasks, supporting function calling and Structured Outputs for production API integration.
- Introducing OpenAI o3 and o4-mini
- OpenAI o3-mini
Structured Outputs
Structured Outputs is an OpenAI feature that enables API models to return responses in a specified JSON schema format, ensuring consistent and machine-readable output for programmatic processing.
- OpenAI o3-mini
- OpenAI o1 and new tools for developers
Oracle
Oracle is a multinational technology company that develops database software and cloud computing services, and is serving as a key technology partner in the Stargate Project, a $500 billion joint venture with OpenAI, SoftBank, and MGX to build AI infrastructure in the United States.
- Introducing OpenAI Frontier
- Announcing The Stargate Project
LLMs
Large language models (LLMs) are artificial neural networks trained on vast amounts of text data to predict and generate human language, with contemporary examples like OpenAI's GPT series and other frontier models capable of reasoning, answering questions, and performing complex language tasks.
- "Disregard that!" attacks
- Learning to reason with LLMs
Cq
- Mozilla dev's "Stack Overflow for agents" targets a key weakness in coding AI
- Show HN: Cq – Stack Overflow for AI coding agents
RSAC 2026
- Claude attacks were 'Rorschach test' for infosec community, scaring former NSA boss
- AI agents are 'gullible' and easy to turn into your minions
PyPI
- Tell HN: Litellm 1.82.7 and 1.82.8 on PyPI are compromised
- LiteLLM Compromised by Credential Stealer
FutureSearch
- Delve did the security compliance on LiteLLM, an AI project hit by malware
- LiteLLM Compromised by Credential Stealer
Showing 262 of 262 entities