Subcellular Targeting Strategy of Tail-Anchored Proteins Based on Multi-Classifier Ensemble
Targeting of tail-anchored proteins (TA proteins) to subcellular regions plays a key role in cell division, apoptosis, and lipid transport. Current subcellular targeted methods for proteins have been constructed by the information of protein sequence, and most of them have the disadvantages of data annotation bias and low model-prediction accuracy. To address the mentioned problems, we propose the subcellular targeting strategy of tail-anchored proteins based on multi-classifier ensemble. Specifically, we first construct the dataset by 428 eukaryotic TA proteins using subcellular targeting. Second, in order to enhance model-prediction accuracy, we add the feature of the hydrophobicity and charge-related. Furthermore, we apply seven popular methods for feature extraction to verify the predictions. Finally, we find that no single classifier can achieve the desired results. Therefore, we ensemble the five classes of weak classifiers i.e., Random Forest (RF), Logistic Regression (LR), Naive Bayes (NB), K-Nearest Neighbour (KNN), and Gradient Boost Decision Tree (GBDT). Through cross-validation, the accuracy of the prediction reached 82.2%.