MXFP4
3 mentions across all digests
MXFP4 is a microscaling 4-bit floating point quantization format used to enable single-GPU deployment of large models like OpenAI's gpt-oss-120b, supported in the Hugging Face Transformers library via pre-compiled downloadable kernels.
Import AI 454: Automating alignment research; safety study of a Chinese model; HiFloat4
Huawei's HiFloat4 quantization format achieves 1.0% relative loss versus MXFP4's 1.5% on Ascend NPUs, signaling Chinese hardware-software co-optimization under US export constraints.
From GPT-2 to gpt-oss: Analyzing the Architectural Advances
OpenAI releases gpt-oss-120b and gpt-oss-20b with MXFP4 quantization, enabling single-GPU deployment and marking a strategic openness shift after five years of closed models.
Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers