Recently, his group proposed three different novel computational methods in Molecular Therapy , Briefings in Bioinformatics, and Briefings in Bioinformatics.
Prof. BALACHANDRAN, MANAVALAN
Prof. Balachandran Manavalan, Department of Integrative Biotechnology, is interested in investigating, developing, and deploying cutting-edge bioinformatics techniques using AI-based machine learning techniques in order to better understand and address a range of open and challenging problems in genomics and molecular biology. Recently, his group proposed three different novel computational methods in Molecular Therapy (Impact Factor 12.91, 2022), Briefings in Bioinformatics (Impact Factor 13.994), and Briefings in Bioinformatics (Impact Factor 13.994), respectively with him as the corresponding author.
1. Human RNA m5C Site Identification Using Stacking Strategy
N5-methylcytosine (m5C) is one of the most prevalent post-transcriptional epigenetic modifications that plays an essential role in various cellular processes and disease pathogenesis. Therefore, it is important to accurately identify m5C modifications in order to gain a deeper understanding of cellular processes and other possible functional mechanisms. Prof. Balachandran Manavalan and Prof. Hong-Wen Deng (Tulane University) team developed a novel strategy to overcome limitations of the existing methods. The team constructed an up-to-date benchmarking dataset and extracted different properties from the sequences that included novel contextual one-hot encoding. Various encodings were used to construct both conventional and deep learning baseline models. A stacking approach was then utilized to combine important models for the final prediction – Deepm5C. The results show that Deepm5C significantly outperformed existing predictors for identifying m5C sites, further demonstrating the efficiency of the proposed hybrid framework.
This research was conducted with the support of NRF- 2021R1A2C1014338 and the result was published online on May 06 in Molecular Therapy (Impact factor 12.91) journal, (Cell Press).
2. Human lncRNA Subcellular Localization Prediction Using Tree-Based Algorithms
Long noncoding RNAs (lncRNAs) are primarily regulated by their cellular localization, which is responsible for their molecular functions, including cell cycle regulation and genome rearrangements. In the past, several ML-based methods have been developed to identify lncRNA subcellular localization, but relevant work for identifying cell-specific localization of human lncRNA remains limited. Prof. Balachandran Manavalan and Prof. Young-Jun Jeon (Department of Integrative Biotechnology) team proposed the first application of tree-based stacking approach named TACOS (Figure 1) to allow users to identify subcellular localization of human lncRNA for ten different cell types. This team conducted comprehensive evaluations of six tree-based classifiers with ten different feature descriptors using a newly constructed balanced training dataset for each cell type. Subsequently, AdaBoost baseline model’s strengths were integrated with an appropriate tree-based classifier for the final prediction.
[Figure 1] An overview of TACOS. It involves the following steps:
dataset construction, feature extraction, baseline model construction, and final model construction.
This research was conducted with the support of NRF (2021R1A2C1014338 and 2021R1C1C1007833) and the result was published online on June 27 in Briefings in Bioinformatics (Impact factor 13.994; JCR=1) journal.
3. Novel Algorithm to Predict Anti-coronavirus Peptides
Unlike conventional non-peptide drugs, antiviral peptide drugs are highly specific, easy to synthesize and modify, and not easily susceptible to drug resistance. To reduce the time and expense involved in screening thousands of peptides and assaying their antiviral activity, computational predictors for identifying ACVPs are needed. Prof. Balachandran Manavalan and Prof. Hiroyuki Kurata (Kyushu Institute of Technology, Japan) team developed a tool called iACVP (Figure 2). Based on an exhaustive analysis of five different classifiers and conventional features, the team concluded that the random forest classifier and the word-embedding word2vec (W2V) achieved the best performance, regardless of the dataset. The two main controlling factors in iACVP were: (i) the dataset-specific W2V dictionary was generated from the training and independent test datasets rather than using the Uniprot proteome and (ii) the optimal k-mer value in W2V, which provides greater discrimination between positive and negative samples.
[Figure 2] Workflow of iACVP development. (A) Construction, evaluation and analysis of the ML methods with W2V encoding and BE. (B) Word2vec encoding of k-mer consecutive amino acid (AA) sequences and the sandwich structure of the training and test datasets.
This research was conducted with the support of NRF- 2021R1A2C1014338 and the result was published online on July 1 in Briefings in Bioinformatics (Impact factor 13.994; JCR=1) journal.
Prof. Balachandran Manavalan has developed several bioinformatics tools that have been widely used by researchers worldwide. He has published several articles in top-tier journals and is continuing to produce highly renowned research. His research interests can be found at: https://balalab-skku.org/. Currently, he is looking for talented and motivated graduate/undergraduate students with backgrounds in biological or biomedical sciences, statistics, chemistry, engineering, or computer science. Interested candidates can directly contact him at email@example.com.