
AI Protein Design Platform Developer

In recent years, advances in deep generative models have enabled scientists to design therapeutic peptides that target difficult-to-drug sites with relatively high precision. However, the critical influence of molecular surfaces in protein-protein interactions (PPI) has been underestimated — much like finding the keyhole but ignoring the correct angle to turn the key, which has significantly hindered the design and discovery of therapeutic peptides.
To bridge this gap, the MoleculeMind team led by Xu Jinbo collaborated with the Stanford University team to propose a full-design peptide generation paradigm called SurfFlow, a novel surface-based generative algorithm capable of comprehensively co-designing the sequence, structure, and surface of peptides.The relevant technical paper of this study has been accepted by KDD (ACM SIGKDD Conference on Knowledge Discovery and Data Mining), the most influential international conference in the field of data mining, for 2025.
SurfFlow adopts a multimodal conditional flow matching (CFM) architecture to learn the distribution of surface geometry and biochemical properties, thereby improving the accuracy of peptide binding.
In the comprehensive PepMerge benchmark, SurfFlow consistently outperforms the all-atom baseline across all metrics. These results highlight the advantages of considering molecular surfaces in de novo peptide discovery and demonstrate the potential of integrating multiple protein modalities for more efficient therapeutic peptide discovery.

01
Peptides are short-chain proteins composed of approximately 2 to 50 amino acids and play a critical role in various biological processes, including cell signaling, enzyme catalysis, and immune responses.
Peptides are indispensable mediators in pharmacology due to their ability to bind to cell surface receptors with high affinity and specificity, exhibiting effects such as low toxicity, low immunogenicity, and ease of intracellular delivery.
Traditional peptide discovery methods rely on the frequent calculation of physical energy functions, but due to the vast peptide design space, this approach is inefficient, thus driving the rapid development of computational methods.
In recent years, molecular surfaces have garnered increasing attention in the study of protein-protein interactions (PPI), as PPI largely depends on the complementarity of the interacting protein surfaces. The electrostatic potential and hydrophobicity of the surface are critical factors determining the strength and specificity of PPI, while its geometry (e.g., protrusions, grooves, and crevices) enables the "lock-and-key" or "induced fit" mechanisms necessary for specific binding.
Illustration: Comparison of full-atom peptide design with and without surface constraints. (Source: Paper)
These surfaces act as a decisionFixed ProteinThe basic interface of how qualities recognize and bind to each other. For these reasons, it is crucial to consider all molecular patterns (sequence, structure, and surface) simultaneously during peptide generation, thereby enhancing the consistency of all aspects of the so-called full design.
02
To achieve this goal, Stanford University and MoleculeMind proposed a novel fully-designed generative algorithm named SurfFlow.
It applies multimodal flow matching (FM) to internal structures and molecular surfaces, which are represented by surface point positions and unit norm vectors, and serve as rigid frames in SE(3).
Since complementary surface geometry alone cannot guarantee successful binding – the precision of the binding interface and the placement of charge, polarity, and hydrophobicity are also necessary – SurfFlow incorporates these biochemical property constraints.
Illustration: The SurfFlow workflow is used for the comprehensive design of peptides, taking into account multimodal consistency among sequence, structure, and molecular surface during the generation process. (Source: Paper)
Specifically, it utilizes Discrete FM (DFM) to process the discrete data space of certain categorical surface features and employs Continuous-Time Markov Chains (CTMC).
Moreover, considering the challenge of capturing irregular surface geometries, multi-scale features, and inter-protein interactions in a scalable manner, researchers have proposed an Equivariant Surface Geometry Network (ESGN), which can dynamically model heterogeneous surface graphs while uniquely incorporating both intra-surface and inter-surface interactions.
Given that key peptide properties such as cyclization and disulfide bonds can influence stability and binding affinity, they incorporated these factors as additional conditions to enhance SurfFlow's capacity and generalization ability.
03
The team comprehensively evaluated SurfFlow's performance in unconditional and conditional sequence-structure co-design tasks as well as side-chain packing problems. For benchmarking, they utilized the PepMerge dataset derived from PepBDB and Q-BioLip. The evaluation results are shown in the charts below, demonstrating that it consistently outperforms the all-atom baseline across all metrics.
Table: Evaluation of the strengths and weaknesses of different methods in the sequence structure co-design task, with an ablation study on the key components of SurfFlow. The best and second-best results are marked in bold and underlined. (Source: Paper)
Figure: Binding energy distribution of designed and natural peptides, the lower the better. (Source: Paper)
Illustration: Peptides designed by DL algorithms and references (left); Peptide design with cyclic conditions (right). (Source: Paper)
Although the SurfFlow mechanism has improved upon the original all-atom design mechanism, there is still room for further exploration. For example, incorporating receptor surface information into the joint distribution model could lead to further optimization. Additionally, the success of RFDiffusion suggests that pretraining on conventional proteins in the PDB is beneficial.
Nevertheless, the SurfFlow model is a novel model capable of simultaneously generating all protein modalities (sequence, structure, and surface). Researchers applied SurfFlow to solve a specific peptide design challenge, incorporating key features such as cyclization and disulfide bonds into the generative process.
It is said that the team will launch SurfFlow soon. Interested friends can look forward to it.