. Scientific Frontline: Codetta Program Deciphers Genetic Code in 250,000 Genomes

Tuesday, November 9, 2021

Codetta Program Deciphers Genetic Code in 250,000 Genomes

Within DNA, four chemical bases (shown in green, red, blue, and orange)
strung together in long strands contain the instructions for building proteins.
Credit: Irving Geis/HHMI
Codetta, a new computational method for predicting genetic codes, could reveal insights into how some organisms have modified a code once thought to be universal.

In the 1800s, the Rosetta Stone – an ancient rock slab inscribed with three languages – helped scholars decode Egyptian hieroglyphics. Now, a computer program is doing something similar for the genetic code.

The program, named Codetta, can read the genome sequence of any organism, and then spit out its genetic code: the biological key that translates genetic information into instructions for building proteins. Across most of the tree of life, this code is universal. But scientists have found a handful of exceptions – in some organisms, genetic info codes for instructions different from those in other life-forms.

In the largest screen to date for such alternative genetic codes, the program scanned more than 250,000 genome sequences from bacteria and archaea and identified five never-before-seen codes, Harvard University’s Kate Shulgina and Howard Hughes Medical Investigator Sean Eddy report November 9, 2021, in the journal, eLife. “I told Kate that her new codes are going straight into the textbooks,” Eddy says.

The team’s method is faster, more rigorous, and more comprehensive than previous efforts, says Ken Wolfe, an evolutionary geneticist at University College Dublin who was not involved with the research. “They looked at every genome that’s available for bacteria and archaea – essentially, all the data that exists.”

The work’s practical implications are immediate: scientists using Codetta, which is freely available, will be able to correctly predict which proteins an organism is making. But the program might unlock more sweeping biological insights too.

Unearthing the full set of genetic codes used across life’s kingdoms could crack open a long-standing biological enigma: how an organism can change its genetic code at all. “There are all kinds of theories out there, but it’s still a real mystery,” Eddy says. “How does this possibly happen?”

Exceptions to the rule

Shulgina first learned about the existence of alternative genetic codes in 2016. She was a first-year graduate student at Harvard, and the idea intrigued her.

Students learn one core tenet that undergirds much of molecular biology: DNA encodes instructions for building proteins. The cell converts DNA into RNA messages, and then translates three-letter sets of RNA into protein building blocks called amino acids. The genetic code is the “lookup table” that tells cells which three letters encode which amino acid. In organisms as diverse as hummingbirds, E. coli, and bread mold, for example, the letters GGC code for the amino acid glycine.

Until 1979, most scientists thought this was universally true. That year, molecular biologist Bart Barrell and colleagues discovered an outlier. Human mitochondria, the cell’s energy factories, had tweaked the code. The letters UGA code for the amino acid tryptophan rather than a stop sign indicating the end of the protein, the researchers reported, and AUA codes for methionine rather than isoleucine. It was the first inkling that the genetic code was not actually set in stone. It could evolve.

In the decades that followed, more examples of alternative genetic codes trickled in, from organisms including yeast, bacteria, and protozoa. Scientists have now identified roughly 30 new codes, and Shulgina wondered if even more were out there. No one had done a systematic survey. Without knowing all the genetic codes in use, she says, it’s difficult to answer broad questions about how such alternates evolved. “I started working on a method to find new genetic codes in order to understand their evolution,” she says.

She began designing an algorithm that could decipher any organism’s genetic code and came to Eddy for advice. Eddy, a biologist whose lab specializes in comparing genomes, had also been mulling over the problem. “I have a little notebook of ideas that I want to work on, and this was in it,” he says.

Hunting for new genetic codes

Shulgina joined Eddy’s lab, and over the next five years, their idea took shape as a computer program they called Codetta. The principle behind the program is simple, in theory, Shulgina says.

Codetta reads a genome, then taps into a database of known proteins to compute a likely genetic code. “My method takes advantage of the fact that a lot is known about what proteins are expected to look like,” she says. The program can use that information to figure out which three-letter sets in a particular genome sequence correspond to which amino acids.

Until now, scientists using similar programs have been able to analyze hundreds of genome sequences. Codetta scales up scientists’ code-cracking ability substantially, letting the team systematically screen nearly all known bacteria and archaea – more than 250,000 genomes – for new genetic codes.

Their analysis uncovered some surprises. The team discovered five instances where the code for the amino acid arginine was reassigned to a different amino acid. The results represent the first time scientists had seen such a swap in bacteria. The big question, Shulgina says, is why the code for arginine is so frequently changed. That could hint at the evolutionary forces responsible for forging new codes.

Shulgina and Eddy are now on the prowl for even more new codes. Because they tend to crop up in small genomes, the team plans to turn Codetta loose on viruses and cellular compartments like mitochondria and chloroplasts. “This is going to be rich hunting ground,” Eddy says.

Source/Credit: Howard Hughes Medical Institute 


Featured Article

One Punch Isn’t Enough to Overcome a Common Cancer Mutation

Acute myeloid leukemia as seen under a microscope. Image Credit: Animalculist ( CC BY-SA 4.0 ) Cancer cells are often a mess of mutations. A...

Top Viewed Articles