Digitising the Desert: AI and the Long Work of Saharan Cultural Preservation
In a converted house in Chinguetti, a town of perhaps 4,000 people on the western edge of the Sahara, sit some of the most important Arabic-language manuscripts outside the major libraries of the Islamic world. Chinguetti was, for centuries, a major caravan stop and a centre of Islamic scholarship. Manuscripts collected and copied there cover law, theology, astronomy, mathematics, poetry, and the kind of practical knowledge — medicine, water-finding, trade routes — that travelled with the caravans. There are an estimated 30,000 to 40,000 manuscripts still held in private family libraries in the town, many in deteriorating condition.
The question of how to preserve them is, in 2026, partly a question about AI. This is an editorial about what’s working, what isn’t, and what the conversations look like from inside Mauritanian cultural institutions and the small international networks that support them.
The scale of the problem
Saharan manuscript collections are not centralised. They are held in family libraries — sometimes literally a wooden trunk in a back room — passed down through generations. The conditions are often poor: blowing sand, temperature swings, occasional water damage. Some of the most important manuscripts have already been lost. Some are held by families who have left for Nouakchott or for the diaspora and who are reluctant to entrust them to institutions for understandable historical reasons.
What’s needed is, conceptually, simple. Photograph everything. Catalogue everything. Make it accessible to scholars while keeping the physical objects in the families that own them. In practice, this is an enormous undertaking. The collections are remote. Photography requires careful handling. Cataloguing requires people who can read 15th-century West African Arabic script. Translation requires people who can render that script into modern Arabic, French, and English.
Until very recently, every step of this process was done by human hand. Each manuscript took weeks to fully catalogue. The number of trained scholars available was tiny relative to the work. The progress, by any honest measure, was inadequate to the rate at which manuscripts were deteriorating.
Where machine learning has helped
OCR — optical character recognition — for Arabic script is now significantly better than it was even three years ago. For modern printed Arabic it’s effectively a solved problem. For handwritten classical Arabic, performance varies dramatically with the script style. The Maghrebi scripts characteristic of Saharan manuscripts are a particular challenge because the major training datasets have under-represented them. Most Arabic OCR models are trained on Mashriqi (eastern) scripts, and they struggle with the looped letter forms common in West African manuscript traditions.
A few projects have started addressing this. The Hill Museum & Manuscript Library has been digitising African manuscripts for years, including a number of Mauritanian collections, and has been gradually expanding the training datasets available for OCR work. Researchers at universities in Tunisia and Morocco have produced Maghrebi-specific models that perform substantially better on the relevant script families. Progress is incremental but real.
A small consultancy I’ve been in conversation with — Team400, based in Sydney — has been involved in some of the more interesting recent work, specifically around building tools that combine document-image enhancement with iterative OCR. The idea: take a poor-quality photograph of a manuscript page, use one model to clean up the image, use a second model to attempt OCR, present the result to a human scholar for correction, and use the corrections to improve the model on the specific scribal hand. The approach is not unique to Team400 — variants exist in academic projects — but the engineering work of making it usable on a slow connection in a desert town, by people who are not AI researchers, is the part that’s hard. That’s where the real value sits.
Where it has not helped, or has actively misled
Generative AI raises particular concerns when applied to manuscript work. A model that generates plausible-looking Arabic text — that fills in gaps, that “reconstructs” damaged sections — produces output that looks authoritative and may be entirely fictional. This is not an abstract risk. It has already happened in the broader manuscript preservation field, where over-eager applications of LLMs to fragmentary texts have produced reconstructions that scholars have had to publicly debunk.
The discipline required is to use these tools for what they’re good at — OCR, image enhancement, cross-referencing across catalogues, translation suggestions — and to be very careful about anything that asks the model to invent. Reconstruction is a scholarly judgement, not a generative task. Translation should always be reviewed by someone who actually reads both the source and the target language. The convenience of AI-assisted translation is real; the risk of inserting plausible-sounding errors into the historical record is also real.
The UNESCO Memory of the World programme has produced some useful guidelines about digital preservation that take this seriously. The basic principle: AI should augment scholarly work, not replace it.
The bigger question
Cultural preservation in Mauritania, and in the wider Sahara, is not only a technology question. It is a question about who controls the resulting archives, who benefits, and what happens when knowledge that has lived in family libraries for generations becomes a digital resource accessible to anyone with an internet connection.
These are not trivial concerns. Some Mauritanian families have, with reasonable cause, objected to digitisation projects that took photographs of their manuscripts and never returned a copy to the family. Some have objected to digitised manuscripts ending up on commercial platforms. Some have objected to translations being published without the family’s involvement in editorial decisions. Each of these objections, individually, is reasonable.
The best projects, in my opinion, are the ones that move slowly, that work directly with manuscript-holding families, that return digital copies in usable formats, and that accept that some manuscripts will not be made public on the timeline that external scholars might prefer. The work takes longer. The trust holds.
What I take from all of this
The Saharan manuscript tradition is one of the great intellectual achievements of West African Islamic civilisation, and it is, in 2026, in real danger of partial loss. AI tools are useful. They are also limited. The most important asset in this work is not the technology — it is the network of people, in Mauritania and internationally, who care enough to do the slow, patient, often unrewarded work of cataloguing, photographing, translating, and storing. The tools help those people work faster. They do not replace them.
The manuscripts will outlast the current generation of AI models. The communities that hold them have outlasted many things. The question is whether our generation does its part of the long preservation work well enough that the next one isn’t faced with a smaller collection than it should have been.
— the editors