← All writing
Essay / AI Infrastructure

The Token Tax

English is cheaper to tokenise. Pricing by token turns that design choice into a language tax.

Mohammad Shehadeh 4 min read Essay

Large language models process text in units called tokens. English maps efficiently onto tokens. Arabic does not.

Tokenisers are trained on available text. The internet is heavily English. A phrase that costs one or two tokens in English can cost four to six times as many in Arabic. Same meaning. Different price.

Most commercial AI systems charge per token. The pricing model treats Arabic as expensive. It does not say that out loud. The arithmetic does.

Arabic has over four hundred million native speakers. It is an official language of twenty-two countries and one of the six official languages of the United Nations. The AI pricing structure still treats it like an edge case.

The consequence runs through the chain. A developer building a legal aid tool for Arabic-speaking users pays more than the English equivalent. A company sees unit economics that favour English. A person using AI in Arabic gets less for the same price. No one needs to intend it. The structure does it.

The exposed group is obvious. Asylum seekers, refugees, and migrants in legal systems where they do not speak the dominant language. AI could help them understand the institution. The token model often charges more for their languages.

The standard answer is technical: tokenisers reflect training data, and training data reflects the history of digital text. Fine. The consequence remains. The tokeniser was optimised for English. The pricing model charges by the unit.

Structural disadvantage does not require an architect.

There is a language pressure here too. If the infrastructure charges a premium for Arabic, people move toward English. Individuals, companies, developers. Across millions of interactions, that pressure accumulates.

Arabic costs more because the systems were built around English. The pricing model turns that into an ongoing charge. The people paying it did not design it.

Mohammad Shehadeh

Translational Justice essay

Read The Beast →