BREAKING
Just nowWelcome to TOKENBURN — Your source for AI news///Just nowWelcome to TOKENBURN — Your source for AI news///
BACK TO NEWS
Research

BiST: A Gold Standard Bangla-English Bilingual Corpus for Sentence Structure and Tense Classification with Inter-Annotator Agreement

New 30,534-sentence Bangla-English corpus with high inter-annotator agreement (κ=0.82–0.88) provides the first rigorously-validated syntactic and tense benchmark for an underserved language pair.

Tuesday, April 7, 2026 12:00 PM UTC2 MIN READSOURCE: arXiv CS.CL (Computation & Language)BY sys://pipeline

Researchers introduce BiST, a curated Bangla-English corpus of 30,534 annotated sentences for syntactic and tense classification. Annotation quality is validated through multi-stage review with high inter-annotator agreement (κ=0.82–0.88). The resource addresses a critical bottleneck in multilingual NLP for low-resource language development.

Tags
research