A piece of selective information passes through a gate unit, an operation performed mainly by the sigmoid neural layer with the dot multiplication operation.The gate with forget function accomplished a decision on kinds of information being discard and determined the previously stored information to the current unit.Forget gate exploited ht was the previous cell output and xt was the current cell input at time step t.Forget gate was used to bloviate something selectively.A considerable number of theoretical and practical outcomes supported that a deep hierarchical network model might be more competent for complex tasks than a shallow one. In order to develop a deep hierarchical structure of the current LSTM network, we constructed the stacked LSTM deep network by stacking multiple LSTM hidden layers on top of each other, which included one input, three LSTM hidden, threedropout layer, and one output layer.As the number of neurons in the output layer equals the number of classes, therefore, the number of neurons or memory blocks in each layer of the network was. In the output layer, the sigmoid activation function was employed to generate probabilistic results.We exploited the crossvalidation test, which is a robust statistical process to evade the overfitting problem while making it a suitable procedure for various classification algorithms.Among them, although the jackknife test is regarded to be the least arbitrary capable of providing distinctive output on the dataset, however, the computational cost of jackknife test is high in case of large datasets. To avoid the computational complexity, we adopted the fold crossvalidation method, which divided the dataset into K subsets.After K times repetition of the process, it utilized K samples during testing, whereas the remaining K served to train the model.The selection of appropriate assessing parameters was imperative to check the efficiency of the statistical predictor.Here, random data division into training and testing partitions, evaluation, and model development accomplished through the fold crossvalidation testing method.To tune the hyper parameters, we performed stratified fold crossvalidation.The hyperparameters were tuned using a grid search procedure.Table summarizes recommendations and starting points for the most common hyperparameters.The best hyperparameter configuration was data collection and application of dependent models with different configurations, which should be trained, and their performance should also be evaluated on a validation set.As the number of configurations and superparameters increases exponentially, exploring all of them becomes impossible. Thus, it is recommended to optimize the most critical S.Evaluate the performance results on an independent dataset.We performed a grid search on the training set and used MCC and ACC to select the next set of hyperparameters.A series of comparative experiments were conducted by examining five different sequenceencoding schemes that contained sequence location information, amino acid composition descriptors, groupedbased features, and physicochemical propertybased features, which portrayed diverse predictive performance.We first applied fold crossvalidation for predictors of each encoding scheme to test the predictive performance.The experimental results revealed that various features had distinct contributions to predictive performance for all three types of phosphorylation sites. As discussed in various published articles that a serial combination of different features can further improve prediction performance, consequently we pursued to test the predictive performance of combined features.