New Genomics Databases Could Drive Major Breakthroughs

Posted on Categories Discover Magazine

The year 2023 brought a flurry of pivotal advancements to the field of genomics. Three large databases — one of humans, one more broadly of mammals, and one of primates (see sidebar on page 54) — are promising brand new revelations about the source code of life, especially as it pertains to our own species. A notable milestone for a one-of-a-kind repository of ancient DNA could hold similar insights from our ancestors, as well.

What Is a Genome Database?

These massive genomic repositories will grant scientists novel tools for comparing DNA between humans, between humans and animals, and between us and our ancestral relatives. While genome sequencing has been a reality for decades, much of the value of genomics lies in comparing different genomes to one another to understand how they differ, and why those differences matter. Past work in comparative genomics has revealed the underlying genetics behind conditions like autism, and has even uncovered entirely new lineages of humans. Those kinds of discoveries could be just the beginning.

By putting dozens of genomes in one place, the new databases are opening the door to key questions not possible to ask before — from “What does human genetic diversity truly look like?” to “How different are humans and Siberian huskies?”

Taken together, the projects show that even though it’s been around for decades, the field of genomics is still young indeed. “There is a rainforest of hidden stuff out there that we haven’t looked at yet,” says Benedict Paten, a geneticist and associate director of the genomics institute at the University of California, Santa Cruz.

Balto the sled dog, seen here with musher Gunnar Kaasen in 1925, was among 240 mammalian species whose DNA was sequenced and compared as part of the Zoonomia Project. (Credit: Bettmann collection via Getty Images)

Building the Database

The first of these projects, published in the journal Nature in May, presented the human pangenome. It’s made of dozens of highly accurately sequenced human genomes — compiled in a single database — that will inform all human genomics work moving forward. Notably, it updates the current reference genome scientists still use as a guideline, which is based on the first fully-sequenced human genome in 2003.

That prior reference genome had one big flaw: It came largely from one individual, meaning it couldn’t capture the full spectrum of human genetic diversity. “The variations are the things that uniquely define us genetically,” says Paten, a member of the Human Pangenome Reference Consortium (HPRC) that assembled the new database.

While millions of people have had their genomes read, few have been sequenced in sufficient detail to serve as a comprehensive scientific reference. Researchers with the HPRC changed that by taking fully sequenced genomes from 47 people from five continents, and then lining them up side-by-side to review and compare each section. The final result, Paten says, is like a map — adding coordinates to key variations that long went unseen.

Paten says the new pangenome will allow researchers to find and study genetic variations in humans everywhere, potentially shedding light on the roots of diseases like Type 1 diabetes and multiple sclerosis. And that’s just the starting point: The HPRC aims to eventually sequence and add hundreds, and perhaps thousands, of human genomes to expand the range of genetic variation included. That work could also help answer many basic questions about our genes that still stump researchers.

“We’ve gotten really good at sequencing DNA and we’re still not very good at actually telling you which changes are actually important … and which ones are just random noise,” says Elinor Karlsson, a geneticist at the University of Massachusetts Chan Medical School and the Broad Institute at MIT and Harvard University.

Scientists from across the globe have created a new “pangenome” that fills in missing sequencing gaps from the prior human reference genome, capturing vastly more genetic diversity than was possible before. (Credit: Darryl Leja/NHGRI)

How Can we Utilize DNA from Mammals?

To figure that out, scientists need more genomes — and not just from humans, either. That’s where the Zoonomia Project, which Karlsson helped lead, comes in. Scientists with the project compared DNA sequences from 240 species of mammals that exist today, including horses, humans, ground squirrels, and even Balto the sled dog, the famed husky who helped deliver lifesaving medications in Alaska in 1925. Then, using the alignment method Paten and his team helped pioneer, they compared certain regions to each other.

Much as with the human pangenome, seeing all those genomes side-by-side can be enlightening. Karlsson says one priority is ascertaining which regions of the genome are open to evolutionary changes and which aren’t, as well as which regions of the genome are so critical for survival that evolutionary changes rarely occur.

To that end, with the Zoonomia Project, which published its most comprehensive batch of research papers in Science in April, researchers are now targeting genomic regions that look the same in many or all mammals. That’s a strong sign that those areas of the genome code for traits that can’t be changed without causing dire side effects.

Those evolutionarily-conserved regions could help direct scientists studying genetic diseases to crucial mutations. That’s because pinpointing an alteration in a region of the genome that typically never changes is a strong sign it may be tied to disease.

Highlighting Differences in Species

Scientists involved with the project are also exploring other fundamental questions concerning the differences between species. For instance, the project highlights where the human genome contains unique variations of genes that control how DNA gets folded in cells, which affects gene expression. Those changes could reveal why human brains are so much bigger than those of our close chimpanzee relatives, Karlsson says.

Our ancestral relatives aren’t excluded from the 2023 genomics boom, either. This year, the tally of genomes in the world’s largest database of curated ancient human DNA, the Allen Ancient DNA Resource (AADR), crossed the 10,000 mark. The database includes genomes from not only modern humans, but our evolutionary cousins like the Neanderthals and Denisovans, as well.

The AADR is the brainchild of David Reich, who researches ancient DNA at Harvard, and his lab, and includes genomes ranging from just a few hundred years old to over a hundred thousand years old. As the resource crosses the five-figure mark, he says it’s opening up new possibilities for archaeology and anthropology.

With thousands of ancient genomes, scientists can move beyond asking questions about just one or several individuals to studying entire cemeteries — and even whole populations — to understand how those early humans differed from each other. For example, recent work in ancient genomics has revealed sweeping population changes across Europe and Asia thousands of years ago as new groups of ancient humans moved in, painting a far more complex portrait of our evolutionary history than scientists previously imagined.

Reich estimates there are tens of thousands of ancient genomes still waiting to be formally published, meaning the AADR will only continue to grow in coming years, just like the human pangenome and the Zoonomia Project. Large-scale genomics is just getting started.


This story was originally published in our January February 2024 issue. Click here to subscribe to read more stories like this one.

Leave a Reply