Part 3: Sequence Optimization

3.1. Why Optimization Is Possible

In Part 1, we talked about how mRNA molecules transport information from the nucleus (“instructions”) that ribosomes use to build proteins. But how does that happen, exactly? Long story short, ribosomes read mRNA bases in groups of 3 called codons. Ribosomes start translation at the start codon, and proceed to ‘print’ proteins up until a stop codon is encountered. Besides a few signaling sequences, all codons encode for an amino acid. However, you’ll notice that there are a lot of duplicates: there are 4 * 4 * 4 = 64 codons, but only 20 amino acids! That leads to the key property behind optimization: multiple codons can code for the same amino acid. The ‘codon table’ has long been established, and can be found in the image below.

Compact Codon Table. Credits: Wikipedia

At this point, the computer scientists reading this will correctly think that the choice of which codon is used to encode an amino acid can be seen as a decision variable in the problem of optimizing mRNA sequences. But what’s the outcome variable? In other words, what changes if a different codon is used to encode the same amino acid? The decoded protein is the same, but the RNA will have different bases, leading to different molecules with more or less stability, for example. Predicting properties we wish to optimize computationally will be the focus of the last part of this guide.

But first, some terminology. mRNA is made up of bases - A, G, C, and U (much like DNA is made up of A, G, C, and T). Unlike DNA, however, mRNA is not found in a stable double-helix structure - rather, it’s single-stranded, a much less stable structure that leads to its shorter half-life. Pairs will sometimes form bonds with one other pair each to form base pairs, which changes the shape of the molecule while conferring a bit more stability. As we’ve seen above, three consecutive bases form a codon.

To understand how mRNA is optimized, we’ll first have to understand folding. Over the next few sections, we’ll go through the brief history of computational mRNA folding techniques, which will allow us to understand how the state-of-the-art optimizer works.