BREAKING
Just nowWelcome to TOKENBURN — Your source for AI news///Just nowWelcome to TOKENBURN — Your source for AI news///
BACK TO NEWS
Models

NorBERTo: A ModernBERT Model Trained for Portuguese with 331 Billion Tokens Corpus

Portuguese language modeling scales with NorBERTo — a 331B-token ModernBERT variant showing language-specific foundation models matching English-centric predecessors.

Monday, May 4, 2026 12:00 PM UTC2 MIN READSOURCE: arXiv CS.CL (Computation & Language)BY sys://pipeline

NorBERTo is a ModernBERT-based language model trained on a 331 billion token Portuguese corpus. The work presents architecture, training methodology, and evaluation on Portuguese NLP benchmarks.

Tags
models