Why Tigrinya
High asylum relevance. Low institutional capacity.
Eritreans make up 9% of UK asylum claims, the second largest nationality group for the year ending March 2026. ICIBI also identifies Tigrinya as a language of lesser diffusion, where interpreter supply failures can interrupt asylum casework.
Tigrinya uses Ge'ez script. General AI systems can make that expensive to process. That cost lands on people who already have weaker evidence infrastructure.
What we built
A documented Tigrinya corpus and summarisation model.
We built a general Tigrinya corpus of 6,813 BBC articles and 2.8 million words.
Then we fine-tuned a summarisation model on it. This is evidence for a technical claim. Low-resource asylum languages can be processed directly without passing through English first.
What this proves
Original-language AI can work.
Low-resource asylum languages can be processed when the work is scoped, documented, and language-specific. The technical excuse is weaker than it looks.
If the state uses AI on asylum evidence, it should work from the applicant's testimony. A summary of an interpretation is the wrong document.
What comes next
Next: asylum-context testing.
The next stage is asylum-context annotation with native speakers, domain adaptation, and comparison against models such as GPT-4. We need to know where general models fail and where specialist models help.
The first technical materials are available as an open research package. The full XL-Sum Tigrinya corpus is a separate download because of file size.