News

We unlocked the Saudi Genome

Published online 18 March 2025

We completed the first publicly-available full genome from a Saudi individual.

Malak Abedalthagafi

DNA double helix structure composed of human figures on a blue background with copy space. Public health and population genetics concept - stock photo 

DNA double helix structure composed of human figures on a blue background with copy space. Public health and population genetics concept - stock photo

 
TanyaJoy/ iStock / Getty Images Plus

For years, human genome reference projects have relied on individuals of European ancestry. Arab populations have been underrepresented in biomedical research, leading to limited capacity for accurate disease diagnosis, and drug response predictions, for Saudi and Middle Eastern patients.

The completion of KSA001, the first Telomere-to-Telomere (T2T) genome from a Saudi individual in 2022, marks a significant milestone in filling the genomic equity gap, with the aim of ensuring that Saudi Arabian and Arab populations are no longer marginalized in global genetic studies. As the first fully resolved, population-specific genomic references from Saudi Arabia, KSA001 lays the foundation for precision medicine, disease research, and genetic diversity studies in the region. 

Malak Abedalthagafi, Professor, Department of Pathology and Laboratory MedicineEmory University School of Medicine
Malak Abedalthagafi, Professor, Department of Pathology and Laboratory MedicineEmory University School of Medicine"

A Saudi female volunteer provided blood samples, along with her parents, who consented to make the genomic data public and freely available. The parents’ contributions allowed us to sort genetic variants based on parental origin, improving the accuracy of variant identification and allowing the detection of de novo mutations—new genetic changes not inherited from either parent. This approach was essential for building a highly accurate diploid genome, which represents both maternal and paternal genetic contributions. The final assembly was completed using advanced sequencing and computational techniques, which achieved near-perfect genome assembly, resolved previously hidden complex genomic regions, and improved gene annotation accuracy.

The KSA001 contains genetic sequences missed in the standard human reference genome - a global baseline compiled from multiple individuals’ DNA representing the average human genome. Highlighting these variations that are unique to Saudi ancestry, can improve population-specific disease research and diagnostics. For example, Saudi Arabia has one of the highest rates of recessive genetic disorders due to consanguinity. With the new Saudi genomic reference, we can have better variant interpretation in the genetic screening and diagnostics program for multiple disorders, including neuromuscular disease, metabolic syndromes, and hereditary cancers. 

Another benefit of the KSA001 is understanding how Saudi-specific mutations impact cancer progression, which enables the development of targeted therapies, and enhances the drug response predictions, particularly in cardiology, oncology, and metabolic disorders. Developing AI-driven diagnostics trained on the new genomes, can improve the accuracy in genetic variant classification, and accelerate innovative discoveries in neuroscience, immunology, and personalized medicine.

Building on the development of KSA001, we expanded to the Saudi Pangenome, which captures genetic diversity across nine Saudi individuals in the five major geographical regions of Saudi Arabia, allowing for a more representative genome. Unlike KSA001 which serves as the initial high quality reference genome, the Pangenome integrates multiple sequences, detects large-scale structural variations often missed in single-genome references, and provides an improved framework for understanding ancestry, evolutionary history, and disease risk.

The Saudi Pangenome expands our insights by incorporating genetic variation from multiple individuals across the Arabian Peninsula, allowing for a more comprehensive understanding of structural variations, ancestry, and health implications. 

While there are many genetic studies of ancient and current populations of the Middle East, only a little genomic data from the Middle East is publicly available. Unlike previous genomic studies in the region,  the KSA001, Saudi Pangenome, and all associated data are made freely and publicly available following FAIR (Findable, Accessible, Interoperable, and Reusable) data principles.

FAIR principles support global accessibility for research collaborations in genomic medicine, improve representation of Arab populations in large-scale genetic studies, and provide a foundation for integrating Saudi genomic data into AI-driven diagnostics. Publishing KSA001 under public and unrestricted license allows scientists worldwide to explore new approaches in population genetics, evolutionary biology, and clinical genomics.

The genomic efforts mark a progressive milestone for Saudi genomics. However, it’s still a starting point, as we need further use long read sequencing and representation of diverse individuals to capture regional variations. Translating these findings into routine clinical practice remains a challenge, which requires stronger hospital-genomic integration.

Moving forward, our goal is to establish a comprehensive Saudi Pangenome database, integrating thousands of high-resolution genomes to power clinical applications, population-wide studies, and translational medicine. The next decade will witness an explosion of discoveries powered by Arab genomic research, bringing us closer to a future where genetics is fully integrated into healthcare—ensuring better, more precise treatments for all.

 Note: The Saudi Pangenome is currently waiting publication in Nature Portfolio. A preprint version can be found here.

doi:10.1038/nmiddleeast.2025.32