Antimicrobial peptides (AMPs) are promising candidates in the fight against multidrug-resistant pathogens due to its broad range of activities and low toxicity. Some AMPs also display antitumor and antivirus functions making them alternative drug candidates for these important diseases. To facilitate the discovery of AMPs and their functions, we provide this one-stop server for antimicrobial peptide and other activity prediction for unknown sequences. Three methods are currently available:
- AmPEP: Predict antimicrobial activity
- Deep-AmPEP30: Predict antimicrobial activity of short peptides with length <= 30 AA
- AcPEP: Predict anticancer activity (under development)
Our methods and server are in constant development. How often is our server accessed? See our statistics page.
AmPEP: Antimicrobial Peptide Prediction
AmPEP is a sequence-based classification method for AMP using random forest. The prediction model is based on the distribution patterns of amino acid properties along the sequence:
Figure 1: Encode a peptide sequence into distribution patterns of 7-type & 3-class of physiochemical properties.
Using our collection of large and diverse set of AMP/non-AMP data (3268/166791 sequences), we evaluated 19 random forest classifiers with different positive:negative data ratios by 10-fold cross-validation. Our optimal model, AmPEP with 1:3 data ratio achieved a very high accuracy of 96%, MCC of 0.9, AUC-ROC of 0.99 and Kappa statistic of 0.9. Descriptor analysis by Pearson correlation coefficients of AMP/non-AMP distributions revealed that reduced feature sets (from full-feature of 105 to minimal-feature of 23) can achieve comparable performance in all aspects except some reductions in precision. Furthermore, AmPEP achieved high performance in terms of AUC-ROC (0.995), AUC-PR (0.957), MCC (0.921) and kappa (0.962) using a benchmark dataset. Our performance is 1-5% better than two published methods iAMPpred and iAMP-2L.
Figure 2: The prediction model of AmPEP is based on random forest (originally implemented in MATLAB, but now in R for online server).
This online prediction model has been reimplemented in R and tested to achieve very close accuracy to our original MATLAB implementation used for publication. If you want to run the MATLAB code yourself, feel free to download it from here. A re-implementation of the AmPEP with Python is also available here.
Reference: Pratiti Bhadra, Jielu Yan, Jinyan Li, Simon Fong, and Shirley W. I. Siu.* AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest. Scientific Reports, 1697 (2018).
Deep-AmPEP30: Short Antimicrobial Peptide Prediction
Short-length AMPs are considered better drug options as they have enhanced antimicrobial activities, higher stability, and lower manufacturing cost. As existing AMP prediction methods often mixing long sequences and short sequences in both the training and validation of the prediction model, we found out that their prediction accurcies are surprisingly low (60-77%) for short AMPs. To meet the needs of short AMP prediction, we developed Deep-AmPEP30. This is a sequence-based classification method using selected types of PseKRAAC reduced amino acids composition as features (see Figure 3) and convolutional neural network as learning algorithm. Deep-AmPEP30 was tuned to optimize the prediction of short AMPs of 30 AA or less in length and tested to achieve good performances in accuracy 83%, AUC-ROC 0.92 and AUC-PR 0.94.
Figure 3. Steps to generate the feature vector of an example peptide sequence using PseKRAAC feature Type 7-Cluster 15.
Figure 4. The architecture of our CNN-based classifier for short AMP prediction.
Reference: Pratiti Bhadra, Jielu Yan, Ang Li, Pooja Sethiya, Longguang Qin, Hio Kuan Tai, Koon Ho Wong, and Shirley W. I. Siu.* Deep-AmPEP30: Improve short antimicrobial peptides prediction with deep learning. Submitted.
AcPEP: Anti-Cancer Peptide Prediction (under development)
An enduring challenge in cancer drug discovery is to identify highly selective molecules that can target cancer cells with minimum disruption of normal cells. Because short-chain anticancer peptides display high specificity and low toxicity, they are believed to be a promising solution as the next generation of cancer therapeutics.
To facilitate the drug discovery process, we propose an anticancer peptide prediction method based on Chou's amphiphilic pseudo amino acid composition (Am-Pse-AAC), anticancer profile sequence similarity scores (PSI-BLAST-Sc), and water-membrane partitioning free energies of peptides (dG-Part). PSI-BLAST-Sc captures the evolutionary relation among the anticancer peptides, and dG-Part characterizes the ability of peptides to partition into the surface and center of cell membranes. Using three independent datasets, we compared our models to state-of-the-art methods with respect to sensitivity, specificity, accuracy, and Matthew's correlation coefficient. Our three-feature model trained with support vector machine consistently outperformed existing methods on 5-, 10-fold, or leave-one-out cross-validation. Our prediction accuracy is in the range of 92-96% in all three datasets, with Matthew's correlation coefficient 2-11% above that of the best existing methods.
Results of our study show that our proposed method shows promise for anticancer peptide prediction. New features PSI-BLAST-Sc and dG-Part can supplement the well-established Am-Pse-AAC to classify anti-cancer peptides more selectively, specifically, and precisely. For future work, the dG-Part feature can be further improved by adopting cancer membrane models in the theoretical calculation of the transfer free energy scale.
Reference: Pratiti Bhadra, Jielu Yan, and Shirley W. I. Siu.* A machine learning approach to anticancer peptide prediction using amino acid composition, profile similarity, and water-membrane partitioning free energy. Manuscript in Preparation (2018).