Tigrinya NLP and AI in Asylum Research

Why Tigrinya

High asylum relevance. Low institutional capacity.

Eritreans make up 9% of UK asylum claims, the second largest nationality group for the year ending March 2026. ICIBI also identifies Tigrinya as a language of lesser diffusion, where interpreter supply failures can interrupt asylum casework.

Tigrinya uses Ge'ez script. General AI systems can make that expensive to process. That cost lands on people who already have weaker evidence infrastructure.

What we built

A documented Tigrinya corpus and summarisation model.

We built a general Tigrinya corpus of 6,813 BBC articles and 2.8 million words.

Then we fine-tuned a summarisation model on it. This is evidence for a technical claim. Low-resource asylum languages can be processed directly without passing through English first.

What this proves

Original-language AI can work.

Low-resource asylum languages can be processed when the work is scoped, documented, and language-specific. The technical excuse is weaker than it looks.

If the state uses AI on asylum evidence, it should work from the applicant's testimony. A summary of an interpretation is the wrong document.

What comes next

Next: asylum-context testing.

The next stage is asylum-context annotation with native speakers, domain adaptation, and comparison against models such as GPT-4. We need to know where general models fail and where specialist models help.

The first technical materials are available as an open research package. The full XL-Sum Tigrinya corpus is a separate download because of file size.

Research release

Original-language evidence can be processed.

The question is whether institutions will preserve testimony before they summarise it.

ROUGE benchmark

GPT-3.5, GPT-4.1, and the fine-tuned model.

ROUGE-1

55.10

52.60

57.98

ROUGE-2

28.92

26.27

33.46

ROUGE-L

34.07

32.56

36.74

The fine-tuned mT5-small model stays ahead of GPT-3.5. GPT-4.1 leads the character-level Ge'ez comparison across ROUGE-1, ROUGE-2, and ROUGE-L. The policy point stays the same: preserve original-language testimony before any model summarises it.

Open source download

Research package

README.md
tigrinya_summariser.py
compare_models.py
comparison_results.csv 100 articles
comparison_summary.csv

Download Research Package Download Model on Hugging Face

The ready-to-use model is on Hugging Face. The XL-Sum Tigrinya corpus (2.8M words, 6,813 articles) is a separate download because of file size. See README.

Policy connection

This supports Reform 2b: if AI is used in asylum decision-making, it must work from original-language testimony.

Back to Reform 2b Back to case studies