Recently, GlaxoSmithKline (GSK) published an article titled "Computational design of mRNA vaccines" in Vaccine.Discusses in detail the important biophysical characteristics that need to be considered in vaccine design and introduces how to apply computational methods to rapidly design mRNA sequences with desired properties.。Optimizing mRNA sequences is highly challengingAn increasing body of evidence suggests that key characteristics such as vaccine efficacy and thermal stability are primarily influenced bymRNAThe impact of molecular inherent properties, and optimization in vaccine designmRNAThe task of the sequence is extremely challenging.Due to the degeneracy of the genetic code, the codon selection for each amino acid is relatively independent, which may cause the number of mRNA sequences encoding the same protein antigen to increase exponentially.. InSARS-CoV-2Taking the spike protein of the virus as an example, there may theoretically be more than10^632Different types ofmRNASequence, this number far exceeds the total number of atoms in our observable universe. Additionally,Vaccine design needs to balance multiple objectives such as translation efficiency, reduction of adverse reactions, RNA synthesis efficiency, and stability, further increasing the complexity of the design.Challenges and Goals in mRNA Vaccine DesignStructure of mRNA Vaccines1. mRNA Regulatory Sequences (UTR)In the translation process dependent on the cap structure, the 5' UTR plays a decisive role in the translation rate of eukaryotic mRNA. This involves various linear sequences and RNA structures:Schematic Diagram of mRNA Vaccine Sequence1) The Kozak sequence (GCCACC/GCCAUG AUGG) is a crucial RNA consensus sequence that regulates the initiation rate of translation and the synthesis rate of proteins in eukaryotes.2) TOPs upstream of the Kozak sequence are common in cell growth-related mRNAs, which encode a large number of ribosomal proteins. mRNAs containing TOPs are regulated by cell type and sequence context during translation.3) Complex secondary structures such as iron-responsive elements (IREs) can regulate mRNA translation by binding to iron regulatory proteins (IRPs), preventing the formation of the ribosomal initiation complex.The approved COVID-19 mRNA vaccines utilize different cis-regulatory regions and rely on the cap structure for the translation of the target antigen. Meanwhile, the internal ribosome entry site (IRES) element provides a cap-independent pathway for RNA translation. In recent years,With the deepening research on circRNA-mediated protein expression, IRES elements have regained attention.In preclinical experiments, circular RNA was able to induce immune responses similar to those of linear mRNA through IRES-mediated translation.2. Open Reading Frame (ORF) Sequences and Structural ElementsThe optimization of ORF sequences is crucial for vaccine design, mainly in the following three aspects:Examples of RNA Sequences and Structure-Based Motifs Affecting mRNA Vaccine Characteristics1)Selecting the "optimal codon" can match with a greater variety of tRNA, making it easier for ribosome recognition and improving translation efficiency.Reduce the required vaccine dose. On the other hand,Strategically placing non-optimal codons can slow down ribosome movement, aiding in correct protein folding., which is crucial for triggering an effective immune response.2) The careful selection of codons is also crucial for balancing the efficacy and safety of vaccines.For example, GC content guides multiple mRNA degradation pathways through several different mechanisms, playing a central role in the control of mRNA intracellular stability.Sequences rich in uracil may activate the innate immune response, which, while sometimes serving as an adjuvant for vaccines, can reduce vaccine efficacy and cause adverse reactions if the immune response becomes excessive.3) The choice of codons plays a decisive role in the formation of mRNA secondary structures, which are closely linked to protein expression levels. The latest in vitro RNA studies have revealed,Highly ordered sequences located after the initial 30 nucleotides, by extending the half-life of mRNA, which helps improve the efficiency of protein synthesis within cells. Additionally, hydrolytic cleavage of the mRNA molecule at a single open reading frame (ORF) nucleotide position results in its loss of ability to encode functional proteins, leading to a loss of function. However,The presence of secondary structures effectively reduces the risk of mRNA forming conformations that are prone to hydrolysis., thereby helping to protect mRNA from degradation.3. Nucleotide Modifications and Poly(A) TailUnmodified synthetic mRNA vaccines often excessively activate the innate immune system due to direct binding with pattern recognition receptors (PRRs), triggering adverse reactions. Studies have found,mRNA Using Pseudouridine or N1-Methylpseudouridine Instead of Uridine, regardless of the sequence, can reduce recognition by the innate immune system, enhance translational efficiency, and improve biological stability.In addition, the length of Poly(A) can affect the stability and translation efficiency of mRNA.A longer Poly(A) tail can enhance the translational efficiency of exogenous mRNA in mammalian cells.Comprehensive analysis of endogenous mRNA in mammalian cell lines and tissues has revealed an average length ranging between 50 to 250 nucleotides. Among the approved mRNA vaccines, mRNA-1273 is considered to have a non-segmented Poly(A) tail, while BNT162b2 contains a segmented Poly(A) tail characterized by at least two adenine sequences, one of which contains approximately 60 adenines, separated by around 10 non-adenine bases. This design aims to reduce recombination of the DNA template during mRNA production.RNAComputational Optimization of Sequences and Structures1. UTR Design MethodCompared with naturally occurring or modified UTRs, designing a fully synthetic UTR can not only significantly enhance the stability, translational efficiency, and overall performance of mRNA but also be customized according to specific needs, which helps improve the performance and efficacy of vaccines.1) A 5'UTR design approach that has garnered widespread attention is the applicationDeep Learning Technology Based on Convolutional Neural Network (CNN). This method's training involves a library containing 260,000 randomly generated 5'UTRs, and determines the average ribosome load of each 5'UTR by combining large-scale parallel reporter gene assays with multiplexed analyses. The developed model can accurately predict the impact of 5'UTRs on mRNA ribosome load and translation efficiency, providing a powerful tool for improving the design of therapeutic mRNAs.2) Another strategy for designing 5'UTR is to construct an expression model using a library of natural 5'UTRs from diverse sources. This library integrates RNA sequencing (RNA-seq) and ribosome sequencing (Ribo-seq) datasets from different cell types, as well as information from the Genotype-Tissue Expression (GTEx) database of human muscle tissue, while taking into account the known translation efficiency and RNA expression levels of these 5'UTRs. To more accurately evaluate the 5'UTR regions, sequence- and structure-related features were specifically calculated, includingThe frequency of K-mers, the folding energy of RNA, and the number of stop and start codons. Using these data, the researchers trained aRandom Forest Model, this model can predict the expression level and translation efficiency of mRNA based on a natural 5'UTR library, providing an effective method for optimizing 5'UTR to enhance the functionality of mRNA.3) In addition,Deep generative models, including Generative Adversarial Networks (GANs), has recently been used to generate non-coding regulatory elements in DNA. This approach differs from independently optimizing each regulatory region because it considers the interactions between coding regions and cis-regulatory patterns within the entire gene structure, which regulate mRNA transcription and protein abundance. The newly generated functional regulatory DNA in Saccharomyces cerevisiae exceeded the expression levels of highly expressed natural controls and sequences in the training data. In the future, this method could also be applied to the design of mRNA sequences.2. Codon Optimization MethodAs introduced earlier, codon optimality is related to the abundance of tRNA and has been linked to translation elongation, efficiency, fidelity, and mRNA stability. Several computational optimization strategies have been proposed to enhance these crucial attributes by replacing synonymous codons within ORF sequences.1)Sharpet al.Codon Adaptation Index (CAI)It is an indicator to measure the optimality of codons.CAIBy comparing the frequency of each codon in a given protein sequence with the frequency of the most common synonymous codon in the organism, the geometric mean of the relative frequencies of all codons is calculated. Selecting the most common codon for each amino acid can enableCAIReach the maximum value1. For example, approvedmRNAVaccineBNT-162b2AndmRNA-1273AdoptedCAIThe value is significantly higher than that of its corresponding wild-type antigen sequence.mRNADesign (CAI > 0.9)。2) Affected bytRNAGene copy number and intracellulartRNAInspired by the positive horizontal correlation,dos Reiset al. developedtRNAFitness Index (tAI), as a new indicator to measure codon optimality.tAIIs calculated based on each codontRNAThe geometric mean of the relative fitness values of gene copy numbers is derived.3) In addition to these "classic" codon optimality metrics, codon content can influencemRNAStability.Presnyaket al.Estimated All61A pair of codonsmRNAThe role of stability, and based on this, createdCodon Stability Coefficient (CSC)。CSCBased on the distribution frequency of codons in transcripts andmRNACorrelation between experimental data of half-life.CSCNot only withtAIAndCAIRelevant, and can also comprehensively predictmRNAStability and Translation Efficiency4) InmRNAIn the sequence design, there is also a tendency toEnrichmentGCContent and reduction of uracil content3. Structural Optimization MethodsThe Minimum Free Energy (MFE) is a key indicator for evaluating mRNA stability, as it relates to the formation of RNA secondary structures. There are currently various efficient computational methods available to predict RNA secondary structures and calculate MFE, thereby screening out RNA sequences with the lowest MFE for specific antigens.1)CDSfoldProcedureUsing the extendedZukerAlgorithms and dynamic programming techniques, based on minimal free energy (MFE) principle, calculate and design the coding sequence with the most stable secondary structure. Under the condition of keeping the amino acid sequence unchanged, the program searches for the sequence with the lowestMFETheRNAA sequence whose computation time grows as the cube of the sequence length.2)LinearDesignFurther improving computational efficiency, it is able to determine the structure with minimal free energy in time proportional to the square of the sequence length (MFE) The OptimalRNASequence. More importantly,LinearDesignThe design allows for simultaneous adjustment towards multiple optimization goals, increasing the flexibility and applicability of the design.In addition,Average Unpaired Probability (AUP)Is another indicator that can be used for stability analysis;Wayment-SteeleDeveloped byRiboTreeAlgorithmMinimizeAUPTo optimizeRNASequence;DegScoreModelUsing ridge regression analysis, combined with sequence information and circular structure allocation, to predictmRNAThe degradation rate; the growing experimental training data combined with machine learning technology enables direct prediction.mRNAAnother method for degradation (and its stability).4. Multi-objective Optimization MethodDeveloping comprehensive algorithms is crucial for designing efficient mRNA vaccines, which need to simultaneously consider multiple objectives such as codon usage, structural stability, and GC content to optimize mRNA sequences. Currently, various computational methods have been proposed, employing different strategies and objective functions for optimization:1)Genetic Algorithm, evaluating sequence variations through multiple objective functions, which may involve RNA sequence metrics, behavioral prediction models, or secondary structure features, to optimize the functionality and performance of synthetic RNA sequences. Although the algorithm is effective in sequence generation, it may be limited by premature convergence, reduced diversity, and restricted exploration scope.2)LinearDesignThis deterministic algorithm uses a deterministic finite state automaton (DFA), providing an effective solution for multi-objective optimization, capable of simultaneously improvingmRNACodon usage and structure of the sequence. The designed sequence exhibits excellent expression and translation efficiency in animal models, and its effectiveness has been validated.3)Activation MaximizationBy leveraging the gradients of performance metrics, iterative optimization of candidate sequences is performed to enhance performance and can be applied in multi-objective optimization scenarios. However, while effective, this method may risk falling into local minima or converging in regions far from the training data.4)Deep Generative ModelsKnown for its rapid performance, it can create synthetic sequences with high expression levels, the performance of which even surpasses natural sequences and training data. For example, deep exploration networks (DENs) Using variational autoencoders to reduce bias during the training process and increase activation, while based on long short-term memory (LSTM) can construct single-stranded nucleic acids that efficiently bind to target proteins. Although these methods are mainly used forDNADesign of sequences and regulatory elements, but also in enhancingmRNAThe optimization of vaccine coding sequences shows great application potential.Comparison of Various SARS-CoV-2 Spike Protein Coding Sequence DesignsTaking mRNA vaccines targeting the SARS-CoV-2 spike protein as examples, such as BioNTech-Pfizer (BNT162b2) and Moderna (mRNA-1273), which include in their designTwo prolines replace serine (S2P)To enhance its stability in the pre-fusion conformation;Use N1-methyl pseudouridine modification to replace uracil, thereby enhancing RNA stability and reducing immunogenicity.; Optimized the codons to achieve a higher Codon Adaptation Index (CAI) than the original viral sequence. CureVac adopted RNA sequence optimization without nucleotide modification (i.e., GC content enrichment), resulting in suboptimal efficacy.Design of mRNA for ARS-CoV-2 Spike ProteinThroughLinearDesign Online ToolThe designed alternative sequence demonstrates that by adjusting the objectives of the optimization algorithm, it is possible to enhancemRNAStability. ComparisonModernaThe predicted secondary structure of the vaccine sequence and the optimized sequence show that focusing on stability as the optimization priority can result in a more compact structure and a higher proportion of paired nucleotides. This optimization of stability significantly enhances intracellular protein expression levels and boosts antibody production in mice.However, the two approved mRNA vaccines were not optimized for minimal free energy (MFE), which may be one of the factors contributing to the need for cold storage.AIAuxiliarymRNADesign Future Development DirectionLLMInmRNAHas shown great potential in vaccine developmentFor example, GenSLMs (Genome-Scale Language Models), trained on 110 million nucleotide sequences and fine-tuned on 1.5 million SARS-CoV-2 genomes, efficiently identify key variants, accelerating mRNA sequence design for responding to new risk variants. The Enformer model, leveraging a specially designed transformer architecture, precisely predicts gene expression by analyzing long-range interactions in sequences and evaluates the impact of non-coding DNA across different cell types. However, the application of LLMs is limited by the demand for large-scale datasets and the significant computational resources required for training and deployment.In addition, in the sequence design,Selecting the optimal codon is a combinatorial optimization problem, and quantum computing architectures have been proven to effectively solve this problem.Similarly, although there are various computational methods to predict the secondary structure of mRNA, this fundamental combinatorial problem is also well-suited for quantum computers. In the future, mRNA design strategies aimed at secondary structure optimization could leverage the large-scale resources of quantum computing, which excel in rapidly finding high-quality solutions for sequence design and structure prediction.In addition to the optimization of mRNA sequences,Using Machine Learning Models to Predict Effective New LNP Formulations for mRNA Vaccines, and exciting progress has been made in accelerating the optimization of the production process.In the future, with the continuous advancement of machine learning technology and the enhancement of big data analysis capabilities, we are expected to achieve a higher level of personalization and precision in vaccine development, creating more effective and safer mRNA vaccines.Reference:Computational design of mRNA vaccines,Vaccine, Volume 42, Issue 7, 2024. Scan the WeChat QR code to add.Medicine Space-TimeEditor Please indicate: Name + Research Direction!