Researchers introduce BiST, a curated Bangla-English corpus of 30,534 annotated sentences for syntactic and tense classification. Annotation quality is validated through multi-stage review with high inter-annotator agreement (κ=0.82–0.88). The resource addresses a critical bottleneck in multilingual NLP for low-resource language development.
Research
BiST: A Gold Standard Bangla-English Bilingual Corpus for Sentence Structure and Tense Classification with Inter-Annotator Agreement
New 30,534-sentence Bangla-English corpus with high inter-annotator agreement (κ=0.82–0.88) provides the first rigorously-validated syntactic and tense benchmark for an underserved language pair.
Tuesday, April 7, 2026 12:00 PM UTC2 MIN READSOURCE: arXiv CS.CL (Computation & Language)BY sys://pipeline
Tags
research