![]() |
| A DNA strand with a highlighted area indicating a mutation Image Credit: Scientific Frontline |
Every human has tens of thousands of tiny genetic alterations in their DNA, also known as variants, that affect how cells build proteins.
Yet in a given human genome, only a few of these changes are likely to modify proteins in ways that cause disease, which raises a key question: How can scientists find the disease-causing needles in the vast haystack of genetic variants?
For years, scientists have been working on genome-wide association studies and artificial intelligence tools to tackle this question. Now, a new AI model developed by Harvard Medical School researchers and colleagues has pushed forward these efforts. The model, called popEVE, produces a score for each variant in a patient’s genome indicating its likelihood of causing disease and places variants on a continuous spectrum.
In a paper published Nov. 24 in Nature Genetics, the scientists show that popEVE can predict whether variants are benign or pathogenic (disease-causing) and which variants lead to death in childhood versus adulthood.
The model was able to identify more than 100 novel alterations responsible for undiagnosed, rare genetic diseases.
“Our goal was to develop a model that ranks variants by disease severity — providing a prioritized, clinically meaningful view of a person’s genome,” said co-senior author Debora Marks, professor of systems biology in the Blavatnik Institute at HMS.
The team hopes that popEVE can help clinicians diagnose single-variant genetic diseases — especially rare diseases — more quickly and accurately. The model could also be used to identify new drug targets for genetic conditions.
The tool complements efforts across the HMS community to conduct research, build AI tools, and engage in nationwide collaborations to improve the diagnosis and treatment of rare diseases.
Turning EVE into popEVE
As genomic sequencing has become more accessible, physicians have had access to an increasing amount of information about their patients’ genetic variants.
However, for variants whose link to disease remains poorly understood, identifying which of those variants are responsible for a patient’s condition tends to be time-consuming, inefficient, and sometimes fruitless. As a result, many patients with rare or unique genetic diseases remain undiagnosed for years.
Several years ago, the Marks Lab developed a generative AI model called EVE that uses deep evolutionary information from different species to learn patterns of mutations that are highly conserved in biology. EVE can then make predictions about how variants in human genes affect protein function.
But EVE couldn’t easily compare variants on different human genes to determine which might be the most problematic for health. The same is true of other variant prediction models that have emerged in recent years, the researchers said.
The team believed that finding a better way to compare variants across genes might help clinicians choose which variants to prioritize in their research when trying to diagnose and care for patients, said Rose Orenbuch, a research fellow in the Marks Lab and lead author on the new paper.
To create popEVE, the researchers added two components to EVE: a large-language protein model, which learns from the amino acid sequences that make up proteins, and human population data that captures natural genetic variation. In doing so, they were able to calibrate the model so that the score it produces for each variant can be compared across genes.
Because popEVE combines cross-species and within-species information, it reveals how much a variant affects protein function as well as the importance of that variant for human physiology, Marks explained.
Putting popEVE through its paces
When the researchers tested popEVE on documented variants and case studies, they found that it successfully:
- Distinguished between pathogenic and benign variants.
- Discerned healthy controls from patients with severe developmental disorders.
- Determined whether a variant was likely to cause death in childhood or adulthood.
- Assessed whether an alteration was inherited or occurred randomly, even without having parental genetic information.
Importantly, the model did not show ancestry bias by performing worse in people from underrepresented genetic backgrounds and did not overpredict the prevalence of pathogenic variants.
The researchers then applied popEVE to a cohort of around 30,000 patients with severe developmental disorders who had not yet received a diagnosis.
“These are diseases that we assumed were genetic and caused by a single variant based on their severity, but the variant hadn’t been found,” said Orenbuch.
The analysis led to a diagnosis in about one-third of cases.
Perhaps most notably, the model identified variants on 123 genes linked to developmental disorders that had not been previously identified — essentially finding the likely genetic causes of the disorders. In fact, 25 of these genes have since been independently confirmed by research in other labs to cause the disorders.
Moving popEVE into the clinic
Marks and colleagues are now working on making popEVE available to clinicians and researchers to use and validate in the real world.
The team is also collaborating with organizations including the Children’s Rare Disease Collaborative at Boston Children’s Hospital, the Division of Human Genetics at the Children’s Hospital of Philadelphia, and Genomics England in partnership with the Wellcome Sanger Institute.
Marks reports that a clinician-researcher at Centro Nacional de Análisis Genómico in Barcelona, Spain, has been using popEVE to interpret variants in his patients — information that has helped him make several rare-disease diagnoses.
“I feel like we are a step closer to popEVE being useful in the day-to-day pipeline of trying to diagnose genetic diseases faster,” Orenbuch said.
She added that she is especially excited about the model’s potential for patients who have been unable to receive a diagnosis through standard methods.
“These are the cases where we have to look outside of the known disease genes, and popEVE has already found a lot of gene candidates,” she said.
The team noted that while popEVE will need to be further verified to ensure its safety and accuracy before it is widely adopted in the clinic, they hope it can eventually increase clinicians’ confidence in using computational models for genetic diagnoses.
The researchers are also integrating popEVE scores into existing variant and protein databases such as ProtVar and UniProt, which will allow scientists worldwide to use the model to compare variants across genes.
By pinpointing the genetic origins of rare or complex diseases, the researchers noted, popEVE may also identify new targets and avenues for drug development.
“We think prioritizing variants based on predicted disease severity will improve the odds of diagnosis and ultimately pave the way for better treatment and drug discovery,” Marks said.
Resource Material: popEVE
Funding: Funding for the work was provided by a Chan Zuckerberg Initiative Award (Neurodegeneration Challenge Network, CZI2018-191853), a National Institutes of Health Transformational Research Award (TR01CA260415), a National Science Foundation Graduate Research Fellowship, the Spanish Ministry of Science and Innovation (PID2022-140793NA-I00; CEX2020-001049-S; MCIN/AEI/10.13039/501100011033, MCIN/AEI/10.13039/501100011033/FEDER, UE), and the Generalitat de Catalunya (Government of Catalonia) through the CERCA program.
Published in journal: Nature Genetics
Title: Proteome-wide model for human disease genetics
Authors: Rose Orenbuch, Courtney A. Shearer, Aaron W. Kollasch, Aviv D. Spinner, Thomas Hopf, Lood van Niekerk, Dinko Franceschi, Mafalda Dias, Jonathan Frazer, and Debora S. Marks
Source/Credit: Harvard Medical School | Catherine Caruso
Reference Number: gen112425_01

