An NLPâbased feature extraction technique in DeepâLearning models to predict BBB permeability is developed and can be used for the early screening of CNS drugs.
BloodâBrainâBarrier (BBB) permeability is one of the critical factors in the success and failure of CNS drug development. The most accurate method of measuring BBB permeability involves clinical experiments, which are labourâintensive and timeâconsuming. Thus, numerous efforts were made to use artificial intelligence (AI) to predict moleculesⲠBBB permeability. Most of the previous models are based on calculated descriptors and molecular fingerprints. In the present work, we have developed an NLPâbased feature extraction technique in DeepâLearning models to predict BBB permeability. We have used the B3DB database and generated SELFIES to extract features from the molecules. We have employed word level and Nâgram tokenization to represent words into numeric vectors. The extracted features were fed into several Artificial Neural Network (ANN) and Biâdirectional Long ShortâTerm Memory (LSTM) models. The model, ANNâ10 built using ANN and 6âgram tokenization, performed best on the independent test set. The accuracy, precision, recall, F1, specificity and AUC of ROC scores were found to be 0.89, 0.91, 0.91, 0.91, 0.85 and 0.90. Thus, the developed model can be used for the early screening of CNS drugs.