Bambara ASR Leaderboard
Powered by MALIBA-AI โข "No Malian Language Left Behind"
๐ Current Champion: MALIBA-AI/bambara-whisper-small
Main Leaderboard
Choose how to rank the models
๐ Leaderboard Rankings - Lower scores indicate better performance
1 | bambara-whisper-small | 22.64 | 10.94 | ๐ฅ Good | Whisper-based | 2025-03-15 |
๐ Visual Performance Comparison
๐ฏ Automatic Speech Recognition Evaluation Metrics
Word Error Rate (WER)
WER measures transcription accuracy at the word level:
- Formula:
(Substitutions + Insertions + Deletions) / Total Reference Words
- Range: 0% (perfect) to 100%+ (very poor)
- Interpretation:
- 0-5%: ๐ Excellent performance
- 5-15%: ๐ฅ Good performance
- 15-30%: ๐ Fair performance
- 30%+: Poor performance
Character Error Rate (CER)
CER measures transcription accuracy at the character level:
- Advantage: More granular than WER, captures partial matches
- Benefit for Bambara: Particularly valuable for agglutinative languages
- Typical Range: Usually lower than WER values
Combined Score (Primary Ranking Metric)
Formula: Combined Score = 0.7 ร WER + 0.3 ร CER
- Rationale: Balanced evaluation emphasizing word-level accuracy
- Usage: Primary metric for model ranking
๐ฏ Performance Categories
- ๐ Excellent: < 15% Combined Score
- ๐ฅ Good: 15-30% Combined Score
- ๐ Fair: > 30% Combined Score
Submit Your Bambara ASR Model
๐ Ready to benchmark your model? Submit your results and join the leaderboard!
Follow these steps to submit your Bambara ASR model for evaluation.
Use a descriptive name (organization/model format preferred)
Select the type/architecture of your model
Country or region of the developing institution
๐ Submission Requirements
CSV Format:
- Columns:
id
,text
- Match all reference dataset IDs
- No duplicate IDs
- Text transcriptions in Bambara
Data Quality:
- Clean, normalized text
- Consistent formatting
- Complete coverage of test set
Upload your model's transcriptions in the required CSV format
๐ Updated Leaderboard
1 | bambara-whisper-small | 22.64 | 10.94 | ๐ฅ Good | Whisper-based | 2025-03-15 |
1 | bambara-whisper-small | 22.64 | 10.94 | ๐ฅ Good | Whisper-based | 2025-03-15 |
2 | whisper-base | 32.64 | 10.94 | ๐ฅ Good | Foundation | 2025-03-15 |
Compare Two Models
Select two models to compare their performance side-by-side
Select the first model for comparison
Select the second model for comparison
Note on Comparison Results:
- Positive difference values (๐ข) indicate Model 1 performed better
- Negative difference values (๐ด) indicate Model 2 performed better
- Lower error rates indicate better performance
๐ Model Comparison Results
Select two models and click Compare to see the results. |
Select two models and click Compare to see the results. |
Dataset & Methodology
๐ฏ About the Bambara Speech Recognition Benchmark
๐ Dataset Overview
Our benchmark is built on the sudoping01/bambara-speech-recognition-benchmark
dataset, featuring:
- ๐๏ธ Diverse Audio Samples: Various speakers, dialects, and recording conditions
- ๐ฃ๏ธ Speaker Variety: Multiple native Bambara speakers from different regions
- ๐ต Acoustic Diversity: Different recording environments and quality levels
- โ Quality Assurance: Manually validated transcriptions
- ๐ Content Variety: Multiple domains and speaking styles
๐ฌ Evaluation Methodology
Text Normalization Process
- Lowercase conversion for consistency
- Punctuation removal to focus on linguistic content
- Whitespace normalization for standardized formatting
- Unicode normalization for proper character handling
Quality Controls
- Outlier Detection: Extreme error rates are capped to prevent skewing
- Data Validation: Comprehensive format and completeness checks
- Duplicate Prevention: Automatic detection of duplicate submissions
- Missing Data Handling: Identification of incomplete submissions
๐ How to Participate
Step 1: Access the Dataset
from datasets import load_dataset
dataset = load_dataset("sudoping01/bambara-speech-recognition-benchmark")
Step 2: Generate Predictions
- Process the audio files with your ASR model
- Generate transcriptions for each audio sample
- Ensure your model outputs text in Bambara language
Step 3: Format Results
Create a CSV file with exactly these columns:
id
: Sample identifier (must match dataset IDs)text
: Your model's transcription
Step 4: Submit & Evaluate
- Upload your CSV using the submission form
- Your model will be automatically evaluated
- Results appear on the leaderboard immediately
๐ Recognition & Impact
Top-performing models will be:
- Featured prominently on our leaderboard
- Highlighted in MALIBA-AI communications
- Considered for inclusion in production systems
- Invited to present at community events
๐ค Community Guidelines
- Reproducibility: Please provide model details and methodology
- Fair Play: No data leakage or unfair advantages
- Collaboration: Share insights and learnings with the community
- Attribution: Properly cite the benchmark in publications
๐ Technical Specifications
Aspect | Details |
---|---|
Audio Format | WAV, various sample rates |
Language | Bambara (bam) |
Evaluation Metrics | WER, CER, Combined Score |
Text Encoding | UTF-8 |
Submission Format | CSV with id, text columns |
๐ Citation
If you use the Bambara ASR Leaderboard for your scientific publication, or if you find the resources useful, please cite our work:
@misc{bambara_asr_leaderboard_2025, title={Bambara Speech Recognition Leaderboard}, author={MALIBA-AI Team}, year={2025}, url={https://huggingface.co/spaces/MALIBA-AI/bambara-asr-leaderboard}, note={A community initiative for advancing Bambara speech recognition technology} }
About MALIBA-AI
MALIBA-AI: Empowering Mali's Future Through Community-Driven AI Innovation
"No Malian Language Left Behind"
This leaderboard is maintained by the MALIBA-AI initiative to track progress in Bambara speech recognition technology. For more information, visit MALIBA-AI or our Hugging Face page.