Text by Wang Cong
Editor | Wang Duoyu
Typesetting | Shuicheng WenCRISPR Gene EditingIt is widely recognized as the most attention-grabbing and groundbreaking breakthrough in life sciences since the 21st century, hailed as "God's Scissors", just eight years after its official birth in 2012, it gained recognition from the Nobel Prize. By the end of 2023, the first CRISPR-based gene editing therapy was approved by the FDA for marketing, used to treat sickle cell disease and β-thalassemia, thus opening a new chapter in the treatment of genetic diseases.
Despite initial clinical success, current CRISPR gene-editing tools still haveOff-target effects, and may trigger adverse reactions in the immune system, which limits its broader application.In recent years,Artificial Intelligence(AI)The rapid development in the field is expected to unlock gene editors restricted by natural evolution, thereby aiding in the design of more adaptive and powerful gene-editing tools.July 30, 2025, AI protein design companyProfluentResearchers in international top academic journalsNature Published an article titled:Design of highly functional genome editors by modelling CRISPR–Cas sequences The research paper, which was previously released on the preprint platform bioRxiv in April 2024.This study demonstrates aA Gene Editing Tool Fully Designed from Scratch by Artificial Intelligence——OpenCRISPR-1, and successfully carried out the first precise editing of the human genome. Notably, Profluent has open-sourced OpenCRISPR-1, which can be used for free in scientific research as well as for commercial purposes. This paves the way for improving the accessibility and reducing the cost of gene-editing therapies, helping to accelerate the development of treatments for thousands of currently incurable genetic diseases.The breakthrough achieved by Profluent marksA New Era of Gene EditingThe beginning of this era sees AI playing a central role in designing tools that could revolutionize medicine. As AI-driven protein design continues to advance, it holds the promise of bringing us closer to a world where precision treatments for genetic diseases are more accessible and effective than ever before.CRISPR: A Case Study in Serendipitous Discovery in Modern Biology. In 1987, unusual repetitive DNA sequences were observed in *E. coli*, leading to the eventual unveiling of a complex adaptive immune system. Bacteria capture viral DNA fragments as "spacers" within repetitive genomic sequences known as CRISPR arrays. This creates a genetic memory that guides CRISPR-associated(Cas)Protein cleaves DNA, thereby destroying matching viruses upon reinfection.CRISPR: The Quintessential Example of "Serendipitous Discovery" in Modern Biology. It all began in 1987 with the observation of unusual repetitive DNA sequences in E. coli, a finding that ultimately unveiled a sophisticated adaptive immune system in bacteria. Bacteria capture viral DNA fragments as "spacers," embedding them into repetitive genomic sequences known as CRISPR arrays, forming a genetic memory that guides CRISPR-associated mechanisms.(Cas)Protein destroys matching viruses by cutting DNA when the virus invades again.In 2012,Emmanuelle CharpentierandJennifer DoudnaProved that Cas9 from Streptococcus pyogenes(SpCas9)Can be redesigned into a programmable gene-editing tool, with its guide RNA(gRNA)Can guide Cas9 to precisely reach a specific location in the genome. Not long after,Zhang FengThe successful application of CRISPR-Cas9 in human cell genome editing has opened the door to therapeutic applications of gene editing, ultimately leading to the market release of the Casgevy therapy for treating sickle cell disease and β-thalassemia.Principle of CRISPR-Cas9 Gene EditingHowever, despite the powerful functionality of SpCas9-based CRISPR gene editing tools, it remains a "wild and untamed" tool that evolved from bacterial defense mechanisms and did not evolve for the precision required in human therapeutics.SpCas9 can tolerate mismatches between its gRNA and the DNA target sequence, which may lead toOff-target effects(May cut sites other than the target sequence), which may trigger serious safety issues. SpCas9, derived from Streptococcus pyogenes, acts as a common pathogen and may induce pre-existing immune responses in most individuals, potentially neutralizing therapeutic effects. The large size of SpCas9 makes it difficult to package into vectors such as adeno-associated viruses.(AAV)In viral delivery systems such as vectors. Moreover, SpCas9 can only cut near short DNA motifs known as PAM sequences, which results in some genomic regions being inaccessible for targeted editing.These shortcomings are particularly prominent in in vivo gene editing, which involves directly editing genes within cells inside the human body, rather than extracting the cells, editing them externally, and then reinfusing them. Therefore, scientists have been searching for tools better suited for in vivo gene editing, requiring these tools to possess near-perfect specificity and extremely low immunogenicity.In order to achieve the goals of being efficient, specific, deliverable, and non-immunogenic, researchers have adopted a variety of engineering strategies, includingSite-directed mutagenesis(This method sacrifices editing efficiency to reduce off-target effects.),Directed Evolution(Very labor-intensive, and the scope of sequence space explored is limited),Biological Mining(Naturally mined enzymes still have issues with off-target effects and immunogenicity.), these traditional methods are difficult to achieve the above goals.In this latest research, the research team explored the fourth paradigm - utilizing generative artificial intelligence.(generative AI)Designed from scratch. They usedProtein Language ModelThese models are trained on large protein sequence databases. Similar to large language models like ChatGPT, protein language models learn the patterns of human language and process hundreds of millions of examples to learn the implicit "grammar" of protein evolution — that is, the complex statistical relationships between amino acids that represent functional natural proteins.The research team recognized that the performance of any AI model is fundamentally limited by the quality and scale of its training data. Instead of relying on existing databases, they made a tremendous effort in data mining, screening 26.2 trillion base pairs of microbial genomic data to generate over 1.2 million CRISPR "operons" – functional units including Cas protein sequences, CRISPR arrays, crRNA, and PAM – thereby constructingCRISPR-Cas Atlas, the number of Cas9 sequences it contains has increased fourfold compared to the number recorded in the protein database UniProt.Next, the research team implemented a hierarchical training strategy, applying it to the protein language model.ProGen2Fine-tuned, this model was pre-trained on hundreds of millions of protein sequences from the UniRef and BFD databases, and fine-tuned using CRISPR-Cas.Atlas, to study the specific sequence constraints underlying the function of the Cas9 protein. The resulting model was used to generate a series of Cas-like proteins, with nearly five times the diversity of known Cas protein variants, including thousands of candidate proteins never before seen in nature.Train and Generate Diversified Cas-like Protein FamiliesAre all of these generated Cas-like proteins functional? Does the generated Cas-like protein library contain entirely new sequences with ideal properties for the intended application? To demonstrate this, the research team further fine-tuned the model, training it using only approximately 240,000 Cas9 sequences. After generating 350,000 candidate sequences with a protein language model prompted by various natural Cas9 sequence fragments, and computationally screening for sequences with good quality and CRISPR compatibility, the team ultimately selected 209 sequences for further experimental testing in human cells. Among these,OpenCRISPR-1Stand out.AI-Driven Cas-like Protein DesignOpenCRISPR-1, a Cas-like protein entirely designed by AI, with a length of 1,380 amino acids,There are 403 amino acid mutation differences compared to SpCas9, and CRISPR-CasAtlas It has 182 amino acid mutation differences compared to its closest natural sequence. Despite these numerous differences, it exhibits the same targeting efficiency as SpCas9 while significantly enhancing specificity. Compared to SpCas9, OpenCRISPR-1 greatly improves the targeted cleavage ratio.Off-target editing reduced by 95%. Importantly, its off-target editing is a subset of SpCas9, suggesting that it does not introduce new cleavage patterns. Moreover, OpenCRISPR-1The sequence seems to be missing certain epitopes that would be recognized by T cells and lead to SpCas9 immunogenicity, which means the AI-designed OpenCRISPR-1 may potentially be less immunogenic than pathogen-derived genome editors.(e.g., spCas9)HasLower Immunogenicity。Generation of Cas-like proteins exerts gene-editing function in human cells
So,OpenCRISPR-1Whether it can be used forBase Editing?
The research team converted OpenCRISPR-1 into a nickase and then fused it with adenosine deaminase ABE8.20 to construct an adenine base editor. The results showed that it achieved robust A-to-G base editing at three test sites in human cells, with editing efficiency ranging from 35% to 60%. This efficiency is comparable to that of the ABE8.20 adenine base editor based on SpCas9 nickase and does not cause insertions/deletions.
Characterization of OpenCRISPR-1 in PAM Sequence, gRNA, and Base Editing
The research team further attemptedFully Designing a Novel Base Editor System Using AI, includingDeaminase. They first usedA series of adenine deaminases were designed and generated by training TadA-like proteins, with homology to any known deaminase ranging from 55%-80%. Among them, the two most active adenine deaminases, PF-DEAM-1 and PF-DEAM-2, were fused with the SpCas9 nickase or OpenCRISPR-1 nickase. The resulting adenine base editors showed A-to-G base editing efficiency comparable to that of the SpCas9 nickase-based ABE8.20 adenine base editor and both effectively inhibited bystander editing.
Will OpenCRISPR-1 Become a Breakthrough for In Vivo CRISPR Therapy?
The preliminary results are encouraging, but broader experimental validation is needed across different targets and delivery systems to determine their clinical potential. However, the true advancement of this research lies in the process of using generative AI to create CRISPR gene editing systems. As AI-based protein design continues to evolve, this "pre-train, fine-tune, generate, and screen" approach establishes a powerful framework for future research.
Structural Analysis of OpenCRISPR-1
As AI-designed CRISPR-Cas systems move toward clinical applications, it marks the beginning of a broader revolution in the field of precision medicine — a revolution in which therapeutic proteins are AI-designed rather than discovered in nature, and are optimized rather than evolved. The integration of artificial intelligence and biotechnology(AI + BT)It has opened up enormous possibilities, vastly expanding the application of CRISPR into areas far beyond what evolution alone could explore.
https://www.nature.com/articles/s41586-025-09298-z
SettingsStar Marker, so as not to miss exciting tweetsWelcome to forward to Moments and WeChat groupsTo promote the dissemination and exchange of cutting-edge research, we have established severalProfessional Exchange Group, Press and hold the QR code below to add the editor's WeChat and join the group. Due to the large number of applicants, please include a note when adding WeChat:School/Professional/Name, if it isPI/Professor, please also indicate.
PointIn View, Convey Your Taste