CONConceptsResearch

sparse autoencoders

3 mentions across all digests

Sparse autoencoders are neural network models used in mechanistic interpretability research to extract atomic, interpretable features from large language model internals, with applications ranging from understanding GPT-4's learned concepts to steering computational fluid dynamics surrogates.

/// Stats

First Seen2026-03-24

Last Seen2026-04-08

Total Mentions3

Subject Mentions1

Last 7 Days0

Sources2

Peak Relevance4/5

Active Predictions0

/// Recent Stories

2026-04-08HIGH

Sparse Autoencoders as a Steering Basis for Phase Synchronization in Graph-Based CFD Surrogates

Sparse autoencoders enable interpretable, fine-grained steering of graph-based CFD surrogates—offering a mechanistic interpretability approach to control neural physics simulations.

2026-04-07HIGH

MetaSAEs: Joint Training with a Decomposability Penalty Produces More Atomic Sparse Autoencoder Latents

A decomposability penalty during sparse autoencoder training produces more isolated, interpretable features—advancing mechanistic interpretability by reducing representation entanglement.

2026-03-21HIGH

Extracting Concepts from GPT-4

/// Connected Entities

CONmechanistic interpretability

2 shared